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Preface 


Dear reader, 
Dear colleagues in Cyber Security, 


It is our pleasure to present this book and say a few words about the strategy of 
its development and creation. The exponential growth of the Internet intercon- 
nectivity has led us to a significant growth of cyber-attack incidents globally often 
with severe and disastrous consequences. The rapid development of more innova- 
tive and effective (cyber)security solutions and approaches became an urgency for 
the (digital/cyber) security professionals to create solutions to detect, mitigate and 
prevent from grievous consequences. 

Therefore, several years ago we, the editors of this book, came together as a part 
of core group to brainstorm about creating innovative advanced cyber-threat intel- 
ligence, detection and mitigation solutions. The idea was extended into broader 
domain experts comprising of total nine partner multidisciplinary organisations 
from seven EU countries, covering all key aspects of a successful programme. This 
way the project Cyber-Trust (Advanced Cyber-Threat Intelligence, Detection and 
Mitigation Platform for IoT) was created and submitted to the European Commis- 
sions H2020 framework programme for evaluation and potential co-funding. We 
were thrilled to receive the maximum possible scores for all the evaluation criteria 
(1. the excellence and the innovative idea, 2. the impact and 3. the implementation 
plan) for our proposal. This led to an amazing three and a half years of partnership 
and collaboration journey to execute and deliver results as promised. 

As an outcome of these excellent partnership and collaboration efforts, this book 
is a synergetic product of many different minds coming out from multidisciplinary 
and multicultural professionals in cybersecurity domain, originating from Cyber- 
Trust project. The idea to produce this book came up during the third year of 


xi 
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intensive collaboration work within Cyber-Trust focusing from the perspective of 
the exploitation of the results. And it did not stop there. 

We remained open and engaged in more ideas and contributions from our expe- 
riences and future professional plans to make sure that this book will be a valuable 
contribution to research and innovation, science and business on the (digital/cyber) 
security domain beyond the framework of our current collaboration. 

This book provides insights on new security technologies and methods for 
advanced cyber threat intelligence, detection and mitigation. We cover topics 
such as cyber-security and AI, cyber-threat intelligence, digital forensics, moving 
target defense, intrusion detection systems, post-quantum security, privacy and 
data protection, security visualization, smart contracts security, software security, 
blockchain, security architectures, system and data integrity, trust management sys- 
tems, distributed systems security, dynamic risk management, privacy and ethics. 


Wishing you interesting reading! 
Yours sincerely, 
Editors 


Gohar Sargsyan 
Dimitrios Kavallieros 
Nicholas E. Kolokotronis 


Executive Summary 


The “Security Technologies and Methods for Advanced for Advanced Cyber Threat 
Intelligence, Detection and Mitigation” book builds on the experience of the Cyber- 
Trust EU project’s (grant agreement 786698) methods, use cases, technology devel- 
opment, testing and validation and extends into a broader science, lead IT industry 
market and applied research with practical cases. Cybersecurity is gaining momen- 
tum and is scaling up in very many areas, as this publication will show. We provide 
new perspectives on advanced (cyber) security innovation (eco) systems covering 
key different perspectives. How to build and run them from the process and skills 
perspective is of great importance when developing, applying and scaling up inno- 
vative security systems. 

This book is comprised of 12 chapters, consisting of independent parts, which 
provide complete view both on their own and interconnected with relevant parts 
within the book. Below we briefly summarise the contents of each chapter. 

In Chapter 1 the authors cover Design and Architecture considerations for 
Advanced Cyber-Threat Intelligence, Detection, and Mitigation Platforms. In par- 
ticular an architectural framework and approach is introduced which guarantees 
better efficiency. 

Chapter 2 explores the procedural aspects detailing how the impact assessment 
process is organised and takes place inside such complex cybersecurity platforms 
referring to Cyber-Trust case. 

The authors of Chapter 3 outline a system that incorporates and extends current 
tools and techniques from the Cyber Threat Intelligence life-cycle by providing a 
holistic view in the Cyber-Threat Intelligence process. 

Moving Target Defense techniques for mitigation sophisticated IoT (Internet 
of Things) is the core of the Chapter 4, which presents an implementation of an 
intrusion response system. The authors also demonstrate that the evaluation results 
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showed its high effectiveness against traditional threats, and increased in effective- 
ness against novel threats. 

Chapter 5 is focusing on Cyber-Threat Detection in the IoT. Here the authors 
present a comprehensive overview of the IoT devices profiling and threat detection 
solution proposed by Cyber-Trust to tackle the grand challenges of securing the IoT 
devices’ ecosystem. In addition, the effectiveness and performance of the proposed 
solution are in-depth verified, especially against botnets and Zero-day attacks. 

For the Mitigation of Unknown Threats in loT Honeypots, Machine Learning 
can be utilized to effectively address the issue, which is described in detail in Chap- 
ter 6. The approach introduced in this chapter is novel which detects malicious 
network traffic that employs a honeypot and machine learning. 

In Chapter 7 the authors provide a theoretical support of the recent develop- 
ments in the area of post-quantum cryptography (PQC) aiming at the incorpora- 
tion of secure cryptographic primitives to the blockchains. The challenges to both 
researchers and industry regarding the implementation of postquantum algorithms 
in blockchain applications are demonstrated. 

The authors of Chapter 8 discuss and propose an approach to trust computation 
in the Internet of things, which synthesizes behavioral, device status and associated 
risk aspects into a comprehensive trust score, that can be consulted to realize trust- 
based access control. 

Chapter 9 introduces the testing, validation, verification and evaluation method- 
ology that Cyber-Trust project followed during the pilot phase of the project’s life- 
cycle. In a nutshell, the authors present that collecting and analyzing data from 
pilot activities reveals the satisfaction rate of the stakeholders and the level of sys- 
tem’s performance. 

From testing and validation moving on into testbeds for business. Chapter 10 is 
all about Smart Home testbeds for Business. The authors present the results from 
the emulated, tested SoHo (Smart Home) platform, their exploitation potential in 
several fields, mainly from a business perspective as well as their impact on business 
and potential extensions. 

For Chapter 11, we have a valuable input from an industry leadership discussing 
how to secure today’s complex digital realities by introducing tested and proven 
CGI cybersecurity approach for today’s modern work environments. 

Last but not least, Chapter 12 of this book is about the security and privacy 
aspects for digital twins, drivers, concerns and recommendations on how to manage 
risks. Practical cases on point are also provided. 


DOE: 10.1561/9781680838350.ch1 


Chapter 1 


How to Design and Set Architecture for 
Advanced Cyber-Threat Intelligence, 
Detection, and Mitigation Platforms 


By G. Sargsyan* and R. Binnendijk* 


CGI 
*gohar.sargsyan@cgi.com 
‘raymond.binnedijk@cgi.com 


This chapter will demonstrate how to design and set architecture for advanced 
cyber-threat intelligence, detection and mitigation platforms following the example 
of Cyber-Trust EU research and innovation project [1] applying proven architec- 
ture methodology Risk- and Cost-Driven Architecture (RCDA) [2]. The architec- 
ture approach RCDA have advantages versus other approaches which helped the 
consortium partners to agree upon from early stage of platform design and devel- 
opment. According to RCDA principles, the architecture work starts with identi- 
fying architectural concerns with the highest impact in terms of risk and cost, and 
addressing those concerns by making architectural decisions. Hence, this article 
contains the results of the most impactful architectural decisions made. This has 
allowed the architecture and requirements processes to mutually benefit from each 
others progress, and resulted in good cohesion between requirements and archi- 
tecture. The price for this cohesion is some rework in maintaining traceability: In 
this article we introduces the requirements traceability which is based on an early 
stage requirements and further extended into references to the output of end user 
requirements and legal, ethical and data protection frameworks. 


2 How to Design and Set Architecture 


The concerns with the highest impact in terms of risk and cost identified at the 
start of the project were especially integration, but also compliance and security. 


Integration is a concern because the cyber-treat intelligence, detection and mitiga- 
tion solution is composed of many separate components which are being developed 
by various development and research teams. This concern is addressed by shaping a 
modular architecture composed of various loosely coupled components where the 
interfaces between these components are shaped via integration guidelines. In addi- 
tion, the architecture includes the approach chosen to develop or otherwise obtain 
the deliverable elements that make up the technical solution. 


Compliance is an important concern, especially with respect to legal, ethical, social 
and privacy rules. This concern is mainly addressed in Cyber-Trust use case sce- 
narios, the end user requirements legal and ethical recommendations, and impact 
assessment. 


Security is always a key concern in such complex platforms especially on designing 
and developing cyber-threat intelligence, detection, and mitigation platforms. We 
address this concern, aligned with and complementary to Legal and ethical recom- 
mendations. 


1.1 Background and Driving Forces 


By establishing an innovative cyber-threat intelligence gathering, detection, and 
mitigation platform, as well as, by performing high-quality interdisciplinary 
research in key areas, the Cyber-Trust project aims to develop novel technologies 
and concepts to tackle the grand challenges towards securing the ecosystem of IoT 
devices. It is structured around three pillars: a. key proactive technologies, b. cyber- 
attack detection and mitigation, and c. distributed ledger technologies, as seen in 
the Table 1.1 below. 

To set up the Cyber-Trust platform design and architecture iterative approach 
was allied to be able to have the opportunity to validate and learn regarding the 
architectural decisions made. The following iterative cycles have been applied: 


Iterative Cycle 1 — The user requirements and regulatory framework have been 
set up to pave the way for the system design and architecture. During this phase, 
emerging trends in cyber-attacks have been identified to guide the definition of use 
case scenarios and the collection of the end-user requirements and the regulatory 
framework is being analysed and the impact of the proposed methods to funda- 
mental rights, data protection and privacy is being assessed. The use cases have 
been identified. Iterative Cycle 1 includes the work packages 


Background and Driving Forces 3 


© Cyber-threat landscape and end-user requirements; 
e Legal issues: data protection and privacy. 


Iterative Cycle 2 — Platform design. In this phase, the Cyber-Trust platform 
reference architecture is created, incorporating inputs from the first phase, 
translated into technological tools to be built in Iterative Cycle 3. The tools 
above comprising the integrated platform are being designed and prototyped and 
the consortium is in the initial stage of the platform design. The design and archi- 
tecture of the system is implemented under the work package 


e Cyber-Trust framework, platform design and architecture. 


The main outputs of this phase are the platform’s prototype, and its final specifi- 
cations at the end of the phase which are associated with a milestone Cyber-Trust 
architecture and design specifications. In this phase initial version of the system 
design and architecture is set including integration plan. To ensure compliance and 
security privacy consideration, ethical and legal aspects continue to be active in this 
phase to review and advice on the requirements. 


Iterative Cycle 3 — Refinement of design and platform architecture. In this iter- 
ative cycle, the Cyber-Trust platform reference architecture is iteratively being 
monitored and refined in parallel with the tools development and during the 
validation of pilots. The tools are being developed and architecture is being refined 
(if any flaws) or being revalidated throughout the course of software development 
and integration. When the platform is ready pilots will validate the platform, where 
design and architecture follows the final stage of revalidation and provision of any 
input the platform arcitcture may have for more robustness. 

In setting up this complex design and architecture of Cyber-Trust we applied 
iterative approach. Firstly, the initial architecture was delivered. Then feedback 
was gathered during testing and validation workshops engaging advisory board 


Table 1.1. Three pillars of Cyber-Trust. 


Distributed Ledger 
Key proactive technologies Attack detection and mitigation Technologies 


© cyber-threat e advanced targeted attacks e registration 
intelligence e network infrastructure © update 

e cyber-threat sharing attacks e verification 

e reputation/trust e network visualisation © modelling 
management e mitigation and remediation © consensus 


e security games e forensics evidence collection © privacy 
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members and focused expert group. These feedback was processed during Platform 
reference architecture and design specification. The partners produced a prototype 
as a draft version of actual working and partly integrated software components. 
Each technical partner contributed to the development, design and provided, 
explained and shared technology that will serve as the base building blocks for 
implementing the Cyber-Trust platform during component and software develop- 
ment cycle. By developing software early in the project, during design and archi- 
tecture iterative cycle, architecture and implementation was merged early, which 
provided the advantage of validating and refining the architecture and jumpstart 
the implementation to be performed during core software development phase. 

Mixture of research and development: Key Proactive Technologies and cyber- 
threat intelligence, Advanced cyber-attack detection and mitigation and Dis- 
tributed ledger technology for enhanced accountability follow (and partly go paral- 
lel) work package Cyber-Trust framework, platform design and architecture activ- 
ities and aim to implement the solution architecture (Proof of Concept). These 
implementation activities are comprised of a mixture of research and development 
activities, were relevant state of art technology is identified, used and extended and 
new tools are custom developed. The research and technology partners focused 
closely work together with clear identified roles and responsibilities to ensure effi- 
cient, high quality and smooth delivery of Cyber-Trust platform. 


1.2 Architecture Approach and Methodology 


In this section we will provide brief description of architecture methodology 


1.21 Risk and Cost Driven Architecture Methodology (RCDA) 


The consortium has chosen Risk and Cost-Driven Architecture (RCDA) frame- 
work and approach as the key methodology for architecture design. The advantage 
of applying this method is that it supports architectural decision making through- 
out the whole design process [3]. Concerns and decisions are weighed throughout 
the design process and stakeholders’ requirements are constantly validated against 
the design. The design process is iterative to ensure high-quality results. The fact 
that RCDA is a recognized method in the Open Group Certified Architect pro- 
gram, [4] it is an extra advantage for the project and consortium partners to pro- 
mote openness and collaboration on the most efficient way of shaping the design 
and architecture. 

RCDA practices were applied while initially shaping the Cyber-Trust project. 

The following concrete measures were applied: 
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e The architect is involved during the requirements stage to help and guide 


aiming at improving the connection between requirements and design. 


e Architecture is delivered in two increments, with the ability to verify and 


learn: 


o The initial architecture [5] is delivered at early stage, to be able to align 
requirements engineering with software development and to have the 
opportunity to validate design decisions. 

o This initial architecture is validated by building and testing working soft- 
ware, i.e. rapid prototype [6] 

o The architecture is determined after processing the feedback gathered 
through initial architecture [5] the rapid prototype assessment [6] and 
validation and UI mock ups demonstration, assessment and validation [7] 


Architecture focuses on critical design decisions and should not over-specify, 
and start early in the project, but the architectural work does not stop here. 
Technical Design and tools selection is performed later in the project during 
the development stage (Work Packages (WP) 5, 6 and 7 — WP5, 6, 7 in 
the project Cyber-Trust) and pilots implementation (WP8) which is the final 
stage towards platform evaluation and validation, therefore, final architecture 
of the platform. The architect is involved during these work packages where 
the architecture is validated and elaborated. The architect will help and guide 
but not lead. 

Legal and ethical recommendations have been provided throughout the iter- 
ations of architecture work [3]. 


During the project, at the highest level of abstraction, the architectural specifica- 


tion process follows a simple workflow loop with three steps: 


The Architecting Workflow 


Architectural concerns 
(backlog) 


Architectural 
decisions 
Identify & 
prioritize 
= architectural 
= concerns 


Figure 1.1. RCDA Architectural Micro cycle. 
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We call this the “Architecture Micro cycle”. This workflow loop is driven 
by a backlog of unresolved architectural concerns, resulting from the ARCHI- 
TECTURAL REQUIREMENTS PRIORITIZATION practice. The architectural 
decisions taken, resulting from the ARCHITECTURAL DECISION-MAKING 
practice, to address these concerns are added to an ever-growing stack of Architec- 
tural Decisions. 

This microcycle representation is a severe oversimplification. In real life, the 
architectural decisions usually affect more than one concern, and can hardly ever 
be made sequentially. The architect has to make sure that the entire set of decisions 
maximally supports the entire set of concerns. 

In addition to the first two practices (prioritization and decision making) men- 
tioned above, RCDA offers a set of core practices that are applied throughout the 
lifecycle of the project. 


Applying Architectural Strateales 


Figure 1.2. RCDA core practices. 


For more information about RCDA see [2, 3]. 
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Figure 1.3 shows how RCDA practices are applied within the Cyber-Trust 
project process. 

Practices are applied incrementally, continuously identifying and prioritizing 
concerns and making (and applying and documenting and validating etc.) deci- 
sions to mitigate these concerns. 


Specify Design Build Test 
WP2 WP4 WP5, 6 and 7 WP8 


Architecture structure Architecture is Architecture structure is 
is included in D2.3 documented in included, elaborated on and 
D4.1 and D4.4 validated in the various 
technical design deliverables 


Final cycles of 
elaboration and validation 


Figure 1.3. Indicative high-level overview of RCDA practices applied within Cyber-Trust 
project process. 


The Cyber-Trust project is mainly based on a traditional, phased approach 
(waterfall). Although phases overlap and the architect is involved thorough the 
entire lifecycle, most of the work is performed in the design phase (WP4). 


Design 


Figure 1.4. Architectural work is done throughout the project lifecycle. 
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1.2.2 Architecture Views 


The Cyber-Trust solution architecture is shaped by the various architectural 
requirements and decisions made and documented in a set of views (see Table 1.2). 
These views focus on effectively communicating the architecture to the relevant 
stakeholders. Around and beyond these views, additional documentation is pro- 
vided to complete technical systems design. 

The views are detailed in subsequent chapters. 


Table 1.2. Architectural views. 


Architecture views 


Chapter View Goal 

2 Context Describe the high-level solution context. 
Requirements To identify, understand and prioritise 

architecturally significant requirements. 

4 Decisions, concerns & To describe concerns, key decisions and deduced 
deduced architectural architectural requirements. 
requirements 

5 Operational view To describe how the system behaves in an 


operational environment. 
Delivery breakdown view To serve as a basis for planning solution delivery. 
Infrastructure view To identify and explain hardware, infrastructure 
software and deployment aspects of the solution. 
8 Data view Describe the data that is relevant and how this 
data is distributed within the solution. 


9 Security view To describes the set of processes, mechanisms 
and components used to make the system secure. 


1.2.5 Compliance and Security 


Compliance is an important concern, especially with respect to legal, ethical, social, 
privacy rules. This concern is mainly addressed in Cyber-Trust uses case scenarios, 
Cyber-Trust end-user requirements and especially Legal and ethical recommenda- 
tions, which explain how, compliance concerns vary based on the use cases, the 
tools to be developed within the Cyber-Trust project and the architecture will have 
to be flexible enough to address these variances. 

Security is always a key concern in such complex platforms, especially in design- 
ing and developing cyber-threat intelligence, detection, and mitigation platforms. 
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More details will be provided later in this article addressing this concern, aligned 
with and complementary to legal and ethical recommendations. 

The approach that we will apply on designing the Cyber-Trust platform is 
the following: the requirements (end-user requirements and architectural require- 
ments) should be legally and ethically compliant. A legal and ethical review has 
been provided by a dedicated expert partner throughout the entire duration since 
the setting up the system until the validation while designing and developing the 
system. 


1.3 Solution, Context and Overviews 


1.3.1 Context 


The Cyber-Trust project is built upon three main cyber-security research thrusts, 
that is key proactive technologies, cyber-attack detection and mitigation, and dis- 
tributed ledger technologies. The proposed approach aims to capture different 
phases of a large-scale cyber-attack before and after existing (and possibly unknown) 
vulnerabilities of devices have been widely exploited by cyber-criminals to launch 
the attack. Some novel methods and tools will be developed to deal with the fun- 
damental problems of prevention, detection, and mitigation of advanced cyber- 
attacks involving IoT devices and networks. 


CyBER-TRUST platform 


Figure 1.5. High-level solution overview. Source [1]. 
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1.3.2 Static Solution Overview 


The CyberTrust solution is assembled by four solution areas, indicated by four 
colours in the Figure 1.6 below: 


(1) Platform containing all central services and data (blue) 
(2) A platform with specific ISP services and data (orange) 
(3) An application running on smart-phones (green) 

(4) An application running on smart-gateways (purple) 


Dark/Deep/Surface web 
Crawling 
| 
| System2system API 
ea mm a 
End-user Browser 
_ Solution boudary 


| systemasystem API 


End-user Browser 


| Machine2machine API 


Machine2Machine aP, 


Tea — 
Mobile app Ul 


End-user 


Figure 1.6. Static Solution Overview & Solution Boundary. 


1.3.3 Runtime Solution Overview 


The figure below shows the four Cyber-Trust solution area at runtime. Each ISP 
runs its own [SP-services platform, connecting to the various ISP related smart 
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gateways and smart phones. If Law Enforcement Agencies (LEA’s) join the platform 
they will run a dedicated platform, which consists of a subset of the ISP-platform. 
Communication is done via messaging through an Integration Bus. 
Regarding to Figure 1.7 the following considerations are made 


e Message bus for platform integration and eventing and small (<15 mb) mes- 
sages (event with inline data). 

e Separate integration bus instance per platform instance to physically separate 
data flows. 

© For large messages, in addition to data flow there is Component2 Component 
communication (REST-API’s) between GenericPlatform to ISP-Platform/ 
LEA-Platform but also between and ISP-Platform and LEA-Platform. 

e No integration bus for crawling interface (S2S API). 

e No integration bus for webportal2platform interface (S2S API). 

e Multiple IoT-SP platforms 

e Multiple LEA platforms 

e LEA platform is subset of IoT-SP platform 

e This includes the Visualization Portal 


The static and run-time solution overviews together comprise the key architec- 
ture of Cyber-Trust platform as cyber-threat intelligence, detection and mitigation 
platforms. Cyber-Trust platform development any technological advancement or 
devepThis architecture served as guide and fundamental in for further develop- 
ment of the project all technological works including development 


1.4 Conclusions 


In this chapter, we presented an approach to setting up the design and architec- 
ture of advanced cyber-threat intelligence gathering, detection and mitigation plat- 
form. We demonstrated this following the example of Cyber-Trust European Com- 
mission H2020 research and innovation project implemented by nine multidisci- 
plinary partners from seven countries bringing together the best practices and expe- 
riences coming from the project partners. The architecture approach applied on 
Cyber-Trust is Risk and Cost Driven Architecture (RCDA) based on advantages 
versus other approaches that the consortium partners agreed upon at the project 
initiation stage of the platform development. According to RCDA principles, the 
architecture work starts with identifying architectural concerns with the highest 
impact in terms of risk and cost, and addressing those concerns by making archi- 
tectural decisions. Hence, this chapter contains the results of the most impactful 
architectural decisions made. This has allowed the architecture and requirements 
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processes to mutually benefit from each other’s progress, and resulted in good cohe- 
sion between requirements and architecture. The price for this cohesion is some 
rework in maintaining traceability: In this article we introduces the requirements 
traceability which is based on an early stage requirements and further extended 
into references to the output of end user requirements and legal, ethical and dat 
protection framewors. We also took into consideration the concerns with the high- 
est impact in terms of risk and cost identified at the start of the project which were 
especially integration, but also compliance and security. 
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Chapter 2 


The Cyber-Trust Paradigm of Procedural 
Aspects for Cybersecurity Research 
Impact Assessment 


By O. Gkotsopoulou’ 


Vrije Universiteit Brussel 
olga.gkotsopoulou@vub.be 


This chapter explores the meta-elements of an impact assessment, what we call the 
procedural aspects, before, during and after. In other words, how the procedure of 
the impact assessment is organised and takes place inside the Cyber-Trust project. 
This article concentrates all the experience gained and lessons learnt so far. The 
structural scheme used in the Cyber-Trust project can serve as a basis for other 
research project consortia which develop innovative solutions in the field, or as a 
starting point for discussion as to how to improve and eventually standardise such 
procedure. 


*. This Chapter is based on the blogpost ‘Procedural Aspects of an Impact Assessment for Innovative Cyberse- 
curity Systems Research: The Cyber-Trust model’ by Olga Gkotsopoulou, dated 3 December 2020, hosted 


on https://cyber-trust.eu/ 
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2.1 Introduction and Background 


The H2020 Cyber-Trust project aims to foster a holistic and novel cyber-threat 
intelligence gathering, prevention, detection and mitigation platform, to secure 
the complex and ever-growing smart infrastructure, used by millions of people 
daily. The project consortium follows the latest technical innovations as well as 
best practice in the field, observing developments in the applicable legal and 
regulatory framework and investigating other ethical and societal considerations. 
In this regard, from its conception, the Cyber-Trust project has established an 
impact assessment mechanism, with particular focus on data protection and pri- 
vacy, as a cross-disciplinary exercise among its partners consisting of seven consec- 
utive and strongly inter-connected steps. The mechanism corresponds to a data 
protection impact assessment as enshrined in Article 35 GDPR but given the 
complexity of the goal to be achieved, the consortium enhanced the procedure 
with elements of wider impact assessments including broader ethical and societal 
considerations. 

This chapter explores the meta-elements of an impact assessment, what we call 
the procedural aspects, before, during and after. In other words, how the procedure of 
the impact assessment is organised and takes place inside the Cyber-Trust project. 
This article concentrates all the experience gained and lessons learnt so far. The 
structural scheme used in the Cyber-Trust project can serve as a basis for other 
research project consortia which develop innovative solutions in the field, or as a 
starting point for discussion as to how to improve and eventually standardise such 
procedure. 


2.2 The Rationale Behind an Impact Assessment in a 
Cyber-security Research Project 


With the entry into force of the General Data Protection Regulation in 2018, Data 
Protection Impact Assessments (or in short, DPIAs) became a legal requirement for 
data controllers regarding specific data processing operations in some contexts. The 
DPIAs refer to the development or deployment of a new system, product or process 
regarding the processing of personal data, for instance in a large-scale or a novel 
manner. They allow to identify risks well in advance and explore risk mitigation 
strategies. 

Impact assessments, however, are not new. Environmental impact assessments 
have been implemented for years. Organisations have been performing privacy 
impact assessments, impact assessments from a societal or ethical point of view 
or even assessments with a particular focus. 
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2.3 The Rationale Behind an Impact Assessment in 
Cyber-Trust 


A DPIA was considered necessary in the Cyber-Trust context, apart from the fact 
that it was part of the project’s contractual obligations, for two reasons: 


a. with regards to the intended processing after the research, if the system is 
marketed: as is the case with many cybersecurity systems, when fully opera- 
tional and deployed, personal data processing may take place on a large scale. 
This processing quite often will occur with the use of innovative techno- 
logical solutions. In the Cyber-Trust project, novel technologies include the 
use of machine learning, Artificial Intelligence and Distributed Ledger Tech- 
nologies and aim to create a system beyond the current state of the art. Such 
technologies can involve novel forms of data collection and usage, which may 
entail a high risk to individuals’ rights and freedoms. In addition to that, the 
system has a complex constellation of engaged actors (users and end-users), 
ranging from multiple data subjects to telecommunication providers and Law 
Enforcement Agencies. 

b. Intended processing during the research: In the case of the web crawler, 
personal data might be processed without the provision of a privacy notice 
directly to the individual. Given that one part of the crawling service will be 
deployed in a real environment, with little human impact on the choice of 
websites and links that will be accessed, in particular with the use of Artificial 
Intelligence, the possibility to crawl even instantly personal data from pub- 
licly available sources is not remote. Even though in the Cyber-Trust context, 
the purpose of the collection is neither the identification and profiling of indi- 
viduals nor the collection of personal data as such, in the Guidelines of the 
European Commission concerning ethics and data protection in the Horizon 
2020 projects, the use of web crawling is considered as raising ethical con- 
cerns and thus, a DPIA is listed as an appropriate tool for the identification 
of risks and of potential mitigation measures. 


2.4 Existing Guidance 


The procedural steps intertwin with each other creating a net of information flows 
inside the consortium, useful for decision and policy making, and a knowledge hub 
for potential stakeholders who in the future may wish to deploy the system. The 
article will not present the actual analysis steps that are expected to take place during 
an impact assessment. As a context dependent process, this can only be defined in 
case-by-case settings. 
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Moreover, there is a lot of guidance concerning the substance of an impact 
assessment. The Article 29 Working Party published in 2017 guidelines on Data 
Protection Impact Assessment to enable the common interpretation of Article 35 
GDPR. National Supervisory Authorities of EU Member States have also pub- 
lished guidelines and templates to assist the data controllers, data processors as well 
as researchers and manufacturers to document and assess the on-going, planned or 
envisaged data processing operations. For instance, the French authority (CNIL) 
has a repository with guidance on its website and even a dedicated software [1]. 
The Brussels Laboratory for Data Protection & Privacy Impact Assessments at the 
Vrije Universiteit Brussel has additionally published a series of briefs on the data 
protection impact assessment process in different languages, providing interactive 
templates [2]. In principle, a specific methodology is not suggested in GDPR. This 
allows organisations to use any framework or methodology, as long as it “describes 
the nature, scope, context and purposes of the processing; assesses the necessity, proportion- 
ality and compliance measures; identifies and assesses risks to individuals; and identifies 
any additional measures to mitigate those risks.” 


2.5 The Seven Steps 


Set the framework: law, 


regulation, ethics, societal 
factors 


Carry out an impact 
assessment and report the 
findings 


Perform checks before the 
pilot testing 


Establish efficient 
communication channels 
and monitoring 
mechanisms 


Organise a workshop to 
discuss and validate the 
findings 
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2.5.1 First Step: Establishing the Legal and Regulatory 
Framework at the Start of the Project 


The Cyber-Trust consortium is rather inter-disciplinary. Its partners come from 
academia, business, public administration and carry with them different back- 
grounds and experiences in: tech, cybersecurity, policy, law, ethics, industry, trade, 
telecommunications, law enforcement. Therefore, the first step is to bring all these 
partners to reflect upon the context in which a) the Cyber-Trust research will take 
place and b) the future Cyber-Trust system will be deployed. In the first few months 
of the project (first semester) and during the system conceptualisation phase, the 
partners explored thoroughly the impact of the legal and regulatory framework 
based on the very rough initial concept of the project. They did so by studying 
the EU regulation framework and the national laws applicable in the countries 
where the partners are based and are of utmost importance in case of future release 
of the system. In the Cyber-Trust context, i.e., in the cybersecurity context, what 
was particularly reviewed were the data protection and privacy laws, laws govern- 
ing telecommunications, laws in relation to evidence with particular focus on elec- 
tronic evidence, regulation in relation to cybercrime, and ad-hoc regulation or pol- 
icy guidelines with respect to specific technologies deployed during the projects 
(DLT systems, machine learning, etc). This study led to two written reports [3, 4] 
establishing basic concepts and building up to complex and niche discussions. In 
this stage, other legal and ethics requirements were also settled by the consortium, 
for instance the involvement or appointment of data protection officers per par- 
ticipating entity, the preparation of templates, such as informed consent forms 
and information sheets for the participation in research and the processing of per- 
sonal data, whenever necessary and so forth. Those requirements would differ from 
project to project. 


2.5.2 Second Step: First Wide Consultation Among Partners to 
Define Together the Way Forward 


In the beginning of the second semester, and after the partners had thoroughly stud- 
ied the legal and regulatory framework, the first consultation among all technical 
partners took place. The key partners were identified with the help of the Project 
Coordinator and the Technical Manager. Those partners were invited to complete 
a brief questionnaire about the concept of the component they were developing. 
The main aim was to have a first impression of the desired design and gather con- 
cerns or questions thereof, that have emerged based on the study of the legal and 
regulatory framework. The result of this consultation was the drafting of a first set 
of general and more concrete recommendations to assist key partners further with 
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their concepts and designs [5]. During this period, a number of ad-hoc bilateral 
meetings took place. This process coincided with the discussions about the initial 
architecture and the partners assessed the need for an impact assessment. At this 
stage, the partners also proposed the impact assessment methodology and estab- 
lished its reporting procedure. 


2.5.5 Third Step: Carrying Out, Completing and Reporting 
About the Impact Assessment 


In parallel with the intense negotiations for the finalization of the system architec- 
ture, the partners engaged in an extensive dialogue about how to better incorporate 
the recommendations provided in Step 2, into their envisaged work. The partners 
were again invited to complete individual, tailor-made written questionnaires for 
their components, assessing each of them separately but also in the context of the 
overall system. In practice, the partners were invited to elaborate further on their 
initial concerns and questions, as well as to explicitly state the benefits of the pro- 
posed solutions. 

Those questionnaires included open questions, common for all the components 
as well as specific questions, tailor-made for particular components. This exercise 
consists of two steps: first, the partners visualise the component they develop, their 
research needs, the data processing operations they plan and explain how they aim 
to remain compliant during the project, taking a look at the requirements of each 
data protection principle; second, the partners demonstrate how they envisage their 
component to correspond in general to data protection principles, in case of pos- 
sible future commercialisation. In other words, the assessment referred: (a) the 
intended data processing which would take place during the project; and (b) to 
the intended data processing of a novel technological system which is likely to be 
used by different data controllers to carry out different processing operations. 

Due to the disciplinary variance, the partners also created a glossary of often- 
used terms (for instance, what is a data subject, what is the difference between the 
right to privacy and the right to data protection, etc). The consortium was invited 
to ponder upon which information to collect and why, whether that information 
include any personal data and why those data are necessary for the purpose they 
have in mind, under which legal basis and for how long they plan or envisage to 
store those data. 

Timing, precision and flexibility are key here: Although partners were provided 
with initial questionnaires, through continuous interaction some questions were 
refined and new questions were added or dropped. All questionnaires made clear 
from the start, in contact with the Technical manager and the Project coordinator, 
who is in charge of providing a response; in other words, the technical partners 
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having a leading role in the design of a particular data processing operation and 
the non-technical partners who should be consulted due to the weight of their 
expertise in the project. In some occasions, partners were encouraged to consult 
external experts and their own Data Protection Officers. 

Depending on the system in question — as often will be the case for cybersecurity 
systems, the procedure of mapping all the data processing operations from the user 
interface until all the backend sources and databases, may be dynamic, lengthy, 
highly collaborative, rather interactive, intense and resource-demanding. This is 
why, it is advised to initiate it as soon as possible and in any case before the intended 
processing. It is to be noted that this procedure is not a one-time exercise but as 
living instrument will take place alongside the planning, development, validation 
and actual implementation phase. 

The outcome of this initial process in the Cyber-Trust case was a written report, 
which consisted of summaries of all partners’ responses, a set of guidelines per com- 
ponent, a data processing matrix per component and a risk assessment matrix per 
component and for the overall project. The full questionnaires as filled in by the 
partners were also added as Annex at the end of the written report, in case partners 
wish to search for a clarification or for details not included in the main report, in 
line with transparency requirements. 


2.5.4 Fourth Step: Workshop to Discuss and Validate the 
Impact Assessment Outcomes 


After the completion of the first impact assessment and the publication of the out- 
comes, an ad-hoc workshop was organised in plenary to discuss the impact assess- 
ment outcomes and draw attention to the key decision makers inside the consor- 
tium. The primary aim of the workshop was to reflect upon and clarify common 
misconceptions that were observed during the impact assessment procedure, to 
recall the legal and ethical requirements and ultimately to examine the substan- 
tial scope and outcomes of the first impact assessment and evaluate its procedural 
aspects. The workshop was also the starting point for the preparation of the con- 
sequent review of the impact assessment to be completed at the end of the project 
and coincided with the preliminary deliberation of the system workflows. 


2.5.5 Fifth Step: Continuous Communication During the 
Development 


From the beginning of the project and throughout its whole duration, the non- 
technical partners have been participating in regular managerial and technical meet- 
ings and have been monitoring the development process. All partners have been 
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encouraged to contact the legal partners when they have questions or concerns, and 
the legal partners in turn follow the legal and regulatory developments and provide 
updates when a change in a law with a potential impact for the Cyber-Trust system 
occurs, or new case law emerges. Multiple discussions among individual partners, 
the Technical Manager and the Project Coordinator, have led to the drafting of col- 
lective papers and books, investigating inter-disciplinary topics of global interest. 
Such topics include, but are not limited to: data protection by design for cyber- 
security systems in smart homes, privacy preserving mechanisms in Distributed 
Ledger Technology systems, privacy and data protection in the Internet of Things 
ecosystem and so forth. Those initiatives do not only improve the understanding 
of the consortium towards complex issues, but additionally further advance debates 
in the field, mobilising the attention of researchers, stakeholders and citizens with 
the organisation of public seminars and events, as well as forming synergies with 
other research projects. Moreover, an important element in the Cyber-Trust project 
is that, in order to ensure that the impact with respect to the legal and regulatory 
framework will be effectively taken into consideration, the consortium has addi- 
tionally established a number of so-called ‘legal and ethics’ Key Performance Indi- 
cators (KPIs). For example, the partners have to work towards the realisation of a 
specific KPI which establishes the minimum number of privacy-preserving mea- 
sures the system should include by default. 


2.5.6 Sixth Step: Check Before the Pilots 


Before the pilots, key partners were invited to perform a final check that all condi- 
tions in relation to compliance were met. This includes having readily available 
important documentation, such as research participants information sheets and 
consent forms, resuming and completing communication with their Data Protec- 
tion Officers or Ethics committees and receiving any kind of necessary permissions 
or authorisations as well as reviewing and finalising the data flows. 


2.5.7 Seventh Step: Review and Second Assessment Report 


Near the end of the project life cycle, a review of the impact assessment report is 
planned. The aim of the review is to assess the efforts of the partners to incor- 
porate the outcomes of the first impact assessment during the design and actual 
implementation in pilot-testing, conduct a comparative risk assessment based on 
the initial risk assessment matrix and reflect upon any new issues which poten- 
tially emerged due to technical or regulatory updates, in the meantime between 
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the first and the second report. During the review, given the maturity of the pilot 
results, the consortium will first examine whether more components (compared to 
the first report) should be assessed or whether components which were excluded 
from the first report should be now assessed. During the review, the consortium 
will also aim to address issues during Step 4, for instance further improving the 
understanding between the technical and non-technical partners with the expan- 
sion of the established glossary and optimising the methodology. Targeted, tailor- 
made questionnaires will be used again at this stage and bilateral discussions with 
the partners will take place. The results will be compiled in a written report, which 
along with the technical documentation, will accompany the final Cyber-Trust 
platform in case of potential marketing. This documentation will allow interested 
stakeholders and future data controllers to understand the benefits and risks of 
the platform and perform their own assessment, having a solid basis as a starting 
point. 


2.6 Lessons Learnt 


Of paramount importance is planning ahead, starting early enough, including a first 
outline in the research proposal. Then, as this is a horizontal procedure, the proper 
tools and mechanisms (e.g., questionnaires, repositories, glossaries, reports) should 
be identified and used to keep the consortium informed and engaged throughout 
the project life cycle. 


2.7 Concluding Remarks 


To sum up, even though structures for an impact assessment may show similarities, 
for most part they remain tailor-made for each project or system and their particular 
needs, as well as for the decision making they correspond to. The same goes for the 
procedural aspects. As we saw, the procedural aspects of an impact assessment are 
equally important to the substance of it, with regards to its effective and efficient 
completion and regular review. Here we presented the procedural approach adopted 
by the Cyber-Trust project, which constitutes a complex cross-disciplinary system 
with diverse beneficiaries, breaking down into seven steps. In long-term, impact 
assessments can have further benefits, including broader compliance and assistance 
with demonstrating accountability and enhancing trust towards individuals and 
users. 
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In today’s world, technology has become ever-present and more accessible than ever 
via a plethora of different devices and platforms ranging from company servers 
and commodity PCs to mobile phones and wearables, used for interacting with 
and interconnecting a wide range of stakeholders such as households, organiza- 
tions and critical infrastructures. The volume and variety of the different operating 
systems, the device particularities, the various usage domains and the accessibility- 
ready nature of the platforms creates a vast and complex threat landscape that is dif- 
ficult to contain. Trying to stay on top of these evolving cyber-threats has become 
an increasingly difficult task, and timeliness in the delivery of relevant cyber-threat 
related information is essential for appropriate protection and mitigation. Such 
information is typically leveraged from collected data, and includes zero-day vulner- 
abilities and exploits, indicators (system artifacts or observables associated with an 
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attack), security alerts, threat intelligence reports, as well as recommended security 
tool configurations, and is often referred to as Cyber- Threat Intelligence (CTI) and 
entails the collection, analysis, leveraging, management and sharing of huge vol- 
umes of data. In this chapter, we outline INTIME, a system that incorporates and 
extends current tools and techniques from the CT] life-cycle by providing a holis- 
tic view in the Cyber-Threat Intelligence process. Through this process the reader 
will be able to (i) identify a number of modern tools and technologies related to 
the CTI life-cycle mentioned above, (ii) detect issues and research challenges that 
are involved in the design of key technologies for pre-reconnaissance Cyber-Threat 
Intelligence, and (iii) plan follow-up activities that will allow the adoption of the 
latest advances in the field. 


3.1 Introduction 


Over the years cyber-threats have increased in numbers and sophistication; adver- 
saries now use a vast set of tools and tactics to attack their victims with their moti- 
vations ranging from intelligence collection to destruction or financial gain. Thus, 
organisations worldwide, from governments to public and corporate enterprises, 
are under constant threat by these evolving cyber-attacks. Lately, the utilisation of 
Internet-of-Things (IoT) devices on a number of applications, ranging from home 
automation to monitoring of critical infrastructures, has created an even more com- 
plicated cyber-defence landscape. The sheer number of IoT devices deployed glob- 
ally, most of which are readily accessible and easily hacked, allows threat actors to 
use them as the cyber-weapon delivery system of choice in many of today’s cyber- 
attacks, ranging from botnet-building for Distributed Denial-of-Service (DDoS) 
attacks to malware spreading and spamming. 

Trying to stay on top of these evolving cyber-threats has become an increasingly 
difficult task, and timeliness in the delivery of relevant cyber-threat related infor- 
mation is essential for appropriate protection and mitigation. Such information is 
typically leveraged from collected data, and includes zero-day vulnerabilities and 
exploits, indicators (i.e., system artifacts or observables associated with an attack), 
security alerts, threat intelligence reports, as well as recommended security tool 
configurations, and is often referred to as Cyber-Threat Intelligence (CTI). To this 
end, with the term CTI we typically refer to any information that may help an 
organisation identify, assess, monitor, and respond to cyber-threats. In the era of 
big data, it is important to note that the term intelligence does not typically refer 
to the data itself, but rather to information that has been collected, analysed, lever- 
aged and converted to a series of actions that may be followed upon, i.e., has become 
actionable. 
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Figure 3.1. The CTI life-cycle. 
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The CTI cycle, illustrated in Figure 3.1, is the process of generating and evalu- 
ating CTI. The first step of this process is CTI source identification. It pertains to 
the identification of threat information that needs to be collected from monitor- 
ing devices, feeds, and security repositories to support decision-making and raise 
cyber-security awareness. The next step, namely CTI gathering, is the collection 
of the necessary data from the identified sources, along with the tools for extract- 
ing a wide variety of information, like tactical and strategic. This process is not a 
one-time action, but it is be performed in a continuous manner. The main goal at 
this stage is to collect as much information as possible and allow correlations and 
further analysis. The third step is CTI analysis and is built upon the information 
that has been collected; it includes both automated and human-driven analysis. 
The fourth step is CTI sharing to the relevant stakeholders, i.e., the entities that 
can utilize the generated intelligence, in a form that they find to be appropriate, 
useful, and in many cases actionable. This makes sharing highly-dependent on the 
audience (e.g., tactical, operational, and strategic level). CTI review (also referred 
to as CTI feedback), which is the last step in the above process, constitutes the key 
to the continuous improvement of the generated intelligence. 

To support the CTI life-cycle outlined above, Koloveas et al. presented 
the INTIME [1]; an integrated framework for Threat Intelligence Mining and 
Extraction that encompasses key technologies for pre-reconnaissance CTI gather- 
ing, analysis, management and sharing through the use of state-of-the-art tools and 
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technologies. INTIME is an approach that holistically supports the complete CTI 
lifecycle via an integrated, simple-to-use, yet extensible framework and supports 
the task of gathering, consolidating and managing CTI from deep web forums or 
marketplaces and clear web social platforms, leveraging this information to iden- 
tify emerging threats, zero-day vulnerabilities and new exploits to IoT devices. The 
main objective of this chapter is to provide an overview of the architecture and 
implementation of the various tools, methods and algorithms utilised, developed, 
and tested in INTIME. More specifically, we focus on INTIME’s components that 
support: 


e Deciding if a crawled website contains useful CTI; this is achieved by ranking 
the collected content to assess its relevance and usefulness to the task at hand. 

e Extracting CTI from the collected content that was classified as useful, by 
resorting to state-of-the-art natural language understanding and named entity 
recognition techniques. 

e Managing and sharing collected CTI via a combination of custom-made 
and widely adopted, state-of-the-art solutions that allow the exploration, 
consolidation, visualization, and seamless sharing of CTI across different 
organizations. 


INTIME has been entirely designed on and developed by relying on open-source 
software including an open-source focused crawler, an open-source implementation 
of word embeddings for the latent topic modeling, open-source natural language 
understanding tools, and open-source datastores for the storage of the topic models 
and the crawled content. 


3.2 INTIME Architecture 


INTIME’ architecture consists of three major components, namely (a) Data Acqui- 
sition, (b) Data Analysis and (c) Data Management and Sharing. The Data Acquisi- 
tion module is responsible for the monitoring and crawling of various web resources. 
This task is achieved by employing traditional crawling and scraping techniques, 
along with machine learning-assisted components to direct the crawl to relevant 
sources. Although this module can easily extract information from specific well- 
structured sources, further analysis is required when it comes to the web content 
crawled from unstructured or semi-structured sources. To further analyse the gath- 
ered content, the Data Analysis module hosts two machine learning-based submod- 
ules, the Content Ranking submodule, which acts as an internal filter that ranks 
the data according to their relevance to the topic at hand, and the CTI Extraction 
submodule, that employs several information extraction techniques to extract useful 
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information from the webpages that were deemed relevant. The idea behind this 
two-stage approach stems from the inability of a simple crawler to accurately model 
the openness of the topic. The difficulty emerges from websites that, although rel- 
evant to the topic (e.g., discussing IoT security in general), have no actual infor- 
mation that may be leveraged to actionable intelligence (e.g., do not mention any 
specific loT-related vulnerability). After the analysis, the extracted information is 
passed to the last module, named Data Management and Sharing, which hosts and 
dispenses all the Cyber-Threat Intelligence that the system collects. This architec- 
ture was initially developed by Koloveas et al. [2] and was focused on the crawl- 
ing and ranking tasks. Later, it has been extended to its present state through the 
INTIME framework [1]. 

Noteworthily, Machine Learning and Deep Learning have a central role in our 
architecture, as the entire Data Analysis module is built upon Deep Learning tech- 
niques such as Word Embeddings (content ranking) and Named Entity Recog- 
nition (CTI extraction). Also, the Data Acquisition module utilises traditional 
Machine Learning algorithms to classify content gathered by the Crawling and 
Social Media Monitoring submodules. In Figure 3.2, every module where Machine 
Learning methods are present, is enclosed in dashed lines. 
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Data Management 
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Figure 3.2. A high-level view of the system’s architecture. 
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3.3 Data Acquisition Module 


Currently useful cyber-security related information that may be leveraged to action- 
able intelligence may be found in a vast variety of different online sources ranging 
from technical and security-focused blogs in the clear web, discussions between 
experts in specialised security forums, social media content, to underground dark 
web hacker forums and marketplaces selling cybercrime tools and zero-day vul- 
nerabilities and exploits. To cover this widespread need for data acquisition, our 
architecture provides a flexible yet powerful Data Acquisition Module that is con- 
ceptually separated into four distinct submodules: 


1. The Crawling submodule allows users to easily setup and deploy automated 
data collection crawlers that are able to navigate the clear, social, and dark 
web to discover and harvest content of interest. The Crawling submodule 
allows the user to select between a wide variety of options including focused 
(also referred as topical) crawling directed by appropriate machine learning 
methods, downloading of entire domains based on powerful, yet easy to setup 
in-depth crawlers, TOR-based dark web spidering, and semi-automated han- 
dling of authentication methods based on cookie management. After collect- 
ing the content of interest, the users may then use rest of the modules pro- 
vided by our architecture to further process it to extract useful CTI from it. 
The Crawling submodule is discussed in more detail in Section 3.3.1. 

2. The Social Media Monitoring submodule allows users to monitor popu- 
lar social media streams for content of interest; to do so it utilises publicly 
available APIs from social platforms and provides a pre-trained, ready-to-use 
set of classification algorithms that may be used to distinguish between rele- 
vant and non-relevant content. The Social Media Monitoring submodule is 
elaborated on in Section 3.3.2. 

3. The Feed Monitoring submodule allows the users to monitor structured 
JSON or RSS-based data feeds from established sources such as NIST, while 
allowing them to modify several monitoring parameters like the monitoring 
interval and the type of objects they are interested in (e.g., CVEs, CPEs, or 
CWEs). 

4, The Targeted Web Scraping module provides access to structured data from 
reputable sources that do not provide a data feed capability. Inclusion of such 
sources is out-of-the-box for the end user, however due the nature of the web 
scraping task, incorporating new ones includes a certain level of technicality. 
To support this process, our architecture offers a pre-installed set of tools that 
may be used to assist the programmer, including standard HTML parsing, 
XPath querying and JavaScript handling tools and libraries. 
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All the data extracted from the Crawling and Social Media Monitoring submod- 
ules, are stored in an internal NoSQL database (MongoDB), where they are anal- 
ysed by the Content Ranking and CTI Extraction submodules of the Data Analysis 
module. Afterwards, they are sent to the Data Management and Sharing module 
in the form of structured CTI. Notice that data collected by the Feed Monitoring 
and Targeted Web Scraping submodules from the structured data sources are directly 
stored to the Data Management and Sharing module since they require no further 
processing. 

In the following sections, we elaborate further on the submodules that were out- 
lined above. 


3.3.1 The Crawling Submodule 


The Crawling submodule implements several distinct services that may be invoked 
by the users to initiate automated data collection on a wide variety of online sources 
in the clear, social, or dark web; the underlying crawling infrastructure is built on 
NYU's ACHE crawler.' 

The focused crawling functionality uses the SMILE Page Classifier [3], which uses 
a Machine Learning text classifier, trained by a selection of positive and negative 
examples of webpages, to direct the crawl towards topically relevant websites (in 
our case websites with content relevant to cyber security). This functionality can 
also be assisted by the SeedFinder [4] sub-component, which can aid the process 
of locating initial seeds for the focused crawl; this is achieved by combining the 
classification model with a user-provided query relevant to the topic. 

In-depth crawling is essentially a domain downloading operation based on the 
ACHE crawler that traverses a specific domain (like a forum or a website) in a 
breadth-first search manner and download all webpages therein. To direct the crawl 
to specific parts of the domain, regex-based filters are used; these filters provide black- 
listing and whitelisting functionality to direct the crawler away from and towards 
respectively specific sections of the domain. In this way the user may instruct the 
crawler to avoid downloading non-informative pages (e.g., members areas, login or 
help pages) or to actively direct it to specific discussion threads in a forum. 

Dark web crawling is also supported by INTIME’ architecture. This functional- 
ity relies on the utilisation of TOR proxies to visit the user-specified onion links. 
Note that the user is not required to have any experience in this procedure, as all 
required actions (i.e., joining the TOR network, using the proxy, initialising the 
crawler) are fired automatically via internal API calls. 


1. hetps://github.com/ViDA-NYU/ache 
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Any authentication issues that may arise during crawling are resolved via man- 
ual user login (the first time the crawler encounters an authentication barrier) and 
session cookie storage for all subsequent crawler visits. 


3.3.2 The Social Media Monitoring Submodule 


The Social Media Monitoring submodule focuses on real-time event detection from 
social streams using state-of-the-art tools from the data science domain to automat- 
ically classify posts as related or unrelated to a user-defined topic. To gather data 
from social media streams the submodule utilises the provided social platform APIs; 
the user is able to specify a set of social media accounts and/or a set of keywords 
that are of interest and the content collection mechanism will retrieve (in a recur- 
ring publish/subscribe fashion) all content posted from those accounts or matching 
the provided keywords. 

Subsequently, the user is able to classify the retrieved content as related or unre- 
lated to the task by simply selecting among various popular classification algorithms 
including (multinomial) Naive Bayes, K-Nearest Neighbors, decision trees, ran- 
dom forests, logistic regression, SVMs, as well as proven deep learning models like 
Convolutional Neural Networks. All classification and machine learning algorithms 
come pre-trained on real-world data and with default parameter setups for security 
classification tasks, but users may modify both the training data and setup param- 
eters to fit their specific classification needs. The above process is streamlined to be 
usable out-of-the-box, but the advanced user may also customize all parts of the 
process, including content acquisition from social media without an API, intro- 
duction of other classification or machine learning algorithms, and task-specific 
algorithm training. 


3.3.3 Submodules for Monitoring Structured Sources 


Apart from the unstructured and semi-structured data that are gathered by 
the functionalities mentioned above, our system can also be supplemented by 
structured data from reputable sources of CTI. Such sources can be divided 
in two main categories. The first category provides structured data feeds of 
the information collections. The second category does not provide data feeds 
but exposes the contents of their database on web-based UIs in a structured 
manner. Our architecture provides functionalities to extract information from both 
categories. 

For the first category, the system utilises standard JSON/XML parsing tech- 
niques with variable monitoring periods dependent on the data feed’s update 
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frequency. Such sources include NVD? and JVN? vulnerability data stores, which 
provide their data in JSON and XML data feeds respectively. For the second cate- 
gory, several scraping techniques have been implemented, providing a flexible set of 
tools to account for the different types of websites where such information exists. 
These techniques range from standard HTML parsing and XPath querying, to 
sophisticated WebDrivers for automatic form manipulation and dynamic pop-up 
dismissal. Sources in this category include KB-Cert and VulDB? vulnerability data 
stores and Exploit-DB,° which is a CVE-compliant archive of public exploits and 
the corresponding vulnerable software. 

As previously mentioned, the data acquired from these types of sources are 
inserted directly into the Data Management and Sharing module without passing 
through the Data Analysis modules, since they are already structured in the desired 
CTI form. 


3.4 Data Analysis Modules 


Deciding if a collected website contains useful Cyber-Threat Intelligence is a chal- 
lenging task, given the typically generic nature of many websites that discuss general 
security issues. To tackle this problem, we created an additional processing layer that 
initially ranks the collected content to assess its relevance and usefulness to the task 
at hand (Content Ranking submodule) and then attempts to extract actionable CTI 
from the highest ranked documents (CTI Extraction submodule). 


3.41 The Content Ranking Submodule 


The idea behind our ranking approach was to represent the topic as a vocabulary 
distribution by utilising distributional vectors of related words; for example, a topic 
on IoT security could be captured by related words and phrases like “Mirai botnet”, 
“IoT”, or “exploit kits”. Such salient phrases related to the topic may be obtained 
by un-/semi-supervised training of latent topic models over external datasets such 
as IoT and security related forums. In this way, we are able to capture semantic 


2. https://nvd.nist.gov/ 

3. https://jvndb.jvn.jp/en/ 

4. — https://www.kb.cert.org/vuls/ 
5. https://vuldb.com/ 


6. https://www.exploit-db.com/ 


Data Analysis Modules 33 


dependencies and statistical correlations among words for a given topic and repre- 
sent them in a low-dimension latent space. To do so, we used Word2Vec [5]; a shal- 
low, two-layer neural network that can be trained to reconstruct linguistic contexts 
and map semantically similar words close on the embedding space. Each word in the 
embedding space is represented as a word embedding. Those word embeddings can 
capture the relationship between the words in the dataset, making vector arithmetic 
possible. The above-described method, along with a method to map the words to 
our topic, which will be discussed later, could help us create a Topic Vocabulary. 


Topic Vocabulary. To train the Word2Vec model, we had to create an appropriate 
dataset for the Content Ranking task. Our dataset had to contain the common 
vocabulary that is utilised when the topics of JoT and Security are being discussed. 
To capture this vocabulary, we resorted to a number of different discussion forums 
within the Stack Exchange ecosystem. To this end, we utilised the Stack Exchange 
Data Dump’ to get access to IoT and security-related discussion forums including 
Internet of Things,’ Information Security,’ Arduino, and Raspberry Pi.'' The last 
two were selected because they are the most prominent devices for custom IoT 
projects with very active communities, so their data would help our model to better 
incorporate the technical IoT vocabulary. The utilised data dumps contain user 
discussions in Q&A form, including the text from posts, comments and discussion- 
specific tags in XML format. The posts and comments were used as the main input 
for the model. 

On many cases, the words of the trained model were too generic or off-topic, 
thus, there was the need for a method that would remove those words, to create a 
smaller, more robust, topic-specific vocabulary. To do so, we utilised the extracted 
tags and augmented them with the set of N most related terms in the latent space 
for each tag. Table 3.1 shows an example of the most relevant terms to the DDoS 
user tag, for N = 5, 10, 15. 


Ranking Engine. Since useful CTI manifests itself in the form of cyber-security 
articles, user posts in security/hacker forums, or advertisement posts in cyber- 
crime marketplaces, it can be also characterised as distributional vectors of words. 
That way, we can compare the similarity between the distributional vectors of the 


7. https://archive.org/details/stackexchange 
8. https://iot.stackexchange.com/ 
9. — https://security.stackexchange.com/ 


10. https://arduino.stackexchange.com/ 


11. https://raspberrypi.stackexchange.com/ 


34 Cyber-Threat Intelligence 


Table 3.1. Most Relevant Terms for Tag “DDoS”. 


Rank Term 
#1 volumetric 
#2 dos 

#3 flooding 
#4 flood 

#5 sloloris 


#6 denial_of_service 
#7 cloudflare 
#8 prolexic 


#9 floods 
#10 aldos 
#11 slowloris 


#12 Ip_spoofing 
#13 loic 

#14 drdos 

#15 zombies 


harvested content and the given topic to assess the relevance and usefulness of the 
content. 

To do so, we employ the Ranking Engine sub-component. This component first 
creates the Topic Vector, by utilising the resulting Topic Vocabulary and then creates 
a Post Vector for each post entry in the crawled collection. 

The Topic Vector T is constructed as the sum of the distributional vectors of all 
the topic terms f; that exist in the topic vocabulary, i.e., 


p= ys 
Vi 


Similarly, the Post Vector P is constructed as the sum of the distributional vec- 
tors of all the post terms w; that are present in the topic vocabulary. To promote 
the impact of words related to the topic at hand, we introduce a topic-dependent 
weighting scheme for post vectors in the spirit of [6]. Namely, for a topic T and a 
post containing the set of words {w 1, W2, ...}, the post vector is computed as 


P= D> cos(w, T)w; 


Vj 
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Table 3.2. Relevance Score computation. 


Excerpt from: www.iotforall.com/5-worst-iot-hacking-vulnerabilities 


The Mirai Botnet (aka Dyn Attack) Back in October of 2016, the largest DDoS attack 
ever was launched on service provider Dyn using an IoT botnet. This led to huge portions 
of the internet going down, including Twitter, the Guardian, Netflix, Reddit, and CNN. 


This IoT botnet was made possible by malware called Mirai. Once infected with Mirai, 
computers continually search the internet for vulnerable IoT devices and then use known 
default usernames and passwords to log in, infecting them with malware. These devices 
were things like digital cameras and DVR players. 


Relevance Score: 0.8563855440900794 


T Fa © botnet eMiri 
malware 


e 
ddos 


Figure 3.3. Theoretical visualization of the computation process. 


Finally, after both vectors have been computed, the Relevance Score r between 
the topic T and a post P is computed as the cosine similarity of their respective 
distributional vectors in the latent space 


r= cos(T, P) 


Having computed a relevance score for every crawled post in our datastore, the 
task of identifying relevant/useful information is trivially reduced to a mixture of 
thresholding and top-k selection operations. 

Table 3.2 displays an example of the process followed by the component. 
Figure 3.3 shows a theoretical visualization of the computation process. 


3.4.2 The CTI Extraction Submodule 


After the Content Ranking component decides which of the collected websites are 
more likely to contain Cyber-Threat Intelligence, our system has to be able to 
extract that CTI. To do so, we employ several mechanics such as Named Entity 
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Recognition with learned and Regex-based entities, Dependency Parsing to identify 
exploits, malware and vulnerabilities based on the structure of documents, as well 
as, a novel CPE suggestion engine for aiding semi-automated linking to known 
platform/vulnerability naming schemes. 


Named Entity Recognition. The primary technique that was used for the task was 
Named Entity Recognition (NER). This technique can identify specific entities that 
have the potential to lead to CTI discovery. 

Instead of training a NER model with entity-annotated data from scratch, a pre- 
trained model was used to detect generic entities that were not strictly limited to 
the topic of CTI. 

To assist the pre-trained model on finding more entities related to the topic, a 
Phrase Matcher functionality was used. The Phrase Matcher can perform partial 
and full matches to unique multi-word phrases and map them to specified named 
entities. The phrases that we imported to the model were full names of compa- 
nies/organisations and products extracted from JVN and were mapped to the ORG 
and PRODUCT entities (Table 3.3). 

Apart from the entities that the pre-trained model was able to identify, several 
domain-specific entities were also introduced. These entities were inserted to the 
NER pipeline by defining Regular Expressions for each one, via the Regex Matcher 
functionality. 

Table 3.3 shows the entities that INTIME is able to identify, along with the 
mechanisms responsible for the identification. Figure 3.4 shows some identified 
entities on a sample text. 


CPE Suggestion Engine. In the previous section, we outlined the process of 
extracting Named Entities from unstructured text documents in an attempt to 
identify Cyber-Threat Intelligence. While this is an important task on its own, the 
extracted information is still largely unstructured and as the Data Management and 
Sharing module already contains large amounts of verified and structured CTI, 
a mechanism that would help security experts link the newly discovered CTI to 
existing events, would be beneficial to the entire CTI pipeline. 

The most obvious entities that we could use to map the newly-found data to the 
structured CTI are “CVE” and “CPE”. However, non-technical users do not tend 
to use these types of identifiers when they converse in the context of web forums, 
etc., so the likelihood of encountering them in significant enough numbers is low. 
Because of that, a hybrid solution was devised, the CPE Suggestion Engine, which 
we will describe below. 

Although CVE and CPE entities are very rare in a free-text setting, Product enti- 
ties appear with high frequency on the relevant gathered texts. Consequently, a 
product database was used to create a recommendation engine, which by utilising 
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Table 3.3. Supported entity types. 
Rank Term Source 
PERSON People, including fictional Pre-trained model 
ORG Companies, agencies, institutions, etc Pre-trained model, 
PhraseMatcher 
PRODUCT Objects, vehicle, foods, etc. Pre-trained model, 
PhraseMatcher 
DATE Absolute or relative dates or periods Pre-trained model 
TIME Times smaller than a day. Pre-trained model 
MONEY Monetary values. Pre-trained model, 
RegexMatcher 
CVE Common Vulnerabilities and Exposures | RegexMatcher 
(CVE) identifier. 
CPE Common Platform Enumeration (CPE) RegexMatcher 
identifier. 
CWE Common Weakness Enumeration RegexMatcher 


(CWE) identifier. 


CVSS2_VECTOR Common Vulnerability Scoring System — RegexMatcher 


(CVSS) v2. 


CVSS3_VECTOR Common Vulnerability Scoring System RegexMatcher 


(CVSS) v3.0-v3.1. 


IP IP address. RegexMatcher 
VERSION Software version. RegexMatcher 
FILE Filename or file extension. RegexMatcher 
COMMAND/ Shell command/code RegexMatcher 
FUCTION/ function/configuration setting. 

CONFIG 


vulnerability, 


identified an Apache Struts PRODUCT | cpe/aapachesstruts cre) 


, as having been exploited in a significant security incident. This vulnerability affects 


code execution bug and observed affected commands range from simple ( ) as well as more 


versions 


sophisticated commands including pulling down a malicious .ELF FILE executable and execution. “Oracle ore 


Critical Patch Update, which should have already been applied to customer systems well before this 


breach came to light. 


Figure 3.4. Identified entities. 
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text retrieval methods, suggests the most likely CPEs associated with a particular 
Product entity. Those suggestions get added to the object that gets sent to the Data 
Management and Sharing component, where a security expert can evaluate the sug- 
gested CPEs to see if they actually matched an existing Event, and subsequently, 
perform the linking of the Objects when necessary. 

Suggestions were preferred against exact string matching on Product entities 
mainly due to the fact that in a free-text setting, a user might abbreviate a part 
of the product, use only the common popular name of it (e.g., “Struts” instead of 
‘Apache Struts”), or simply make a spelling mistake. To present accurate suggestions, 
the problem was approached by performing fuzzy text search. To that end, the CPE 
Suggestion Engine uses n-grams, a common method for calculating text similarity. 
Initially, the n-grams for each product in the database get generated and indexed. 
Then, a query for each discovered Product entity is performed, and by using Mon- 
goDB’s Text Search Operator, the module compares the similarity of the query’s 
n-grams to the indexed n-grams. In the end, the top 10 results are returned, sorted 
by text match score. 


NP-Chunking. For the final part of the CTI Extraction, a Dependency Parser was 
used to perform the task of “Noun Phrase Chunking” (NP Chunking). NP Chunk- 
ing the subset of Text Chunking that deals with the task of recognizing non- 
overlapping text parts that consist of noun phrases (NPs). 

While most of the CTI that we can expect to discover can be effectively mod- 
elled to the Named Entity Recogniser, some domain-specific concepts cannot be 
adequately defined as named entities. Such concepts include types of attacks and 
system vulnerabilities, exploit names, malware names, etc. They could be added 
to the Phrase Matcher as terminology lists, but due to the dynamic way that such 
concepts are described in non-technical texts, the effectiveness of the system would 
not be satisfactory. 

After a thorough observation of the collected data, we discovered a common 
pattern, that these concepts are innately expressed as Noun Phrase chunks. For 
example, phrases such as “database injection vulnerability”, “brute-force attack” 
and “privilege escalation exploit” are all NPs that can be classified as Cyber-Threat 
Intelligence, and we would not be able to identify them with our pre-existing infras- 
tructure. 

To this end, as part of our CTI Extraction module, we have implemented an NP 
Chunker that detects all the NP chunks found in a document and groups them in 
an object called HIGHLIGHTS. 

For instance, on the document presented in Figure 3.4, the HIGHLIGHTS 
would be the following: 


e “Apache Struts vulnerability”, 
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e “remote code execution bug”, and 
e “April 2017 Critical Patch Update” 


This method greatly assists the security experts to quickly identify whether a 


document contains actionable CTI, or link it to existing CTI objects from various 
sources. 


3.5 Data Management and Sharing 


In this section we present the Data Management and Sharing component, which is 
a full-stack solution, aiming to provide a complete proactive methodology for the 
tasks of CTI management and sharing. The component is able to store CTI from 
various sources, merge artifacts that concern information about the same CTI, and 
inter-correlate similar CTI. After storing all gathered CTI, the Data Management 
and Sharing component is able to present all stored information in human-readable 
formatting, through the MISP web-application. The interface enables users to fur- 
ther edit, analyze, and enrich the stored CTI. Finally, through the utilization of 
MISP, it enables the sharing of the stored CTI, in both human and machine- 
readable formats. 

In the following sections, we provide an overview of the component architec- 
ture (Section 3.5.1), and a brief description of MISP (data model, sharing proper- 
ties, and functionalities), in Section 3.5.2. Then, in Section 3.5.3 we present the 
MISP implementation and customizations within the Data Management and Shar- 
ing component. Finally, we describe the component's functionality in Section 3.5.4. 


3.5.1 Component Overview 


First, we have identified different CTI sources, which are vulnerability and exploit 
databases, containing analyzed CTI, in the form of vulnerability and exploit 
reports. These reports mainly consist of a plethora of useful and actionable intel- 
ligence about the vulnerabilities and exploits, such as a description of the vulner- 
ability at hand, an exploit proof-of-concept, a list of the affected products’ con- 
figurations (CPEs), metrics that provide an impact factor for the affected product 
(CVSS), publication and modification dates, references to similar reports, and a 
unique identifier that has been assigned to the vulnerability at hand (CVE ID). 
However, while the aforementioned sources often provide reports about the same 
unique CVE ID, these tend to differ. This happens due to the dynamicity of avail- 
able information at the time of the analysis. Thus, analyses that occurred at a dif- 
ferent time, may provide different metrics in the final reports. To overcome this 
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Figure 3.5. Data Sharing and Management architecture. 


issue, we gather all publicly available reports from these sources, we parse them, 
one-by-one, to extract the CTI provided, using the Feed Monitoring and Targeted 
Web Scraping modules. Then, we store the parsed CTI, in a clustered manner, with 
regard to the unique CVE ID encompassed. The selected platform for storing and 
disseminating the gathered CTI is MISP. These clusters are called events in the 
MISP platform and the clustering of the reports occurs at the event management 
phase illustrated in Figure 3.5, in which we present an abstracted view of the com- 
ponent architecture. MISP provides the information stored in its database, in both 
human and machine-readable formats, and allows users to access it either through 
a GUI or via a REST API. Finally, MISP has implemented various tools, available 
in the GUI, that enable UI users to review CTI gathered and eliminate false pos- 
itives or comment on the artifacts, and further analyze and enrich CTI through 
correlation processes. 


3.5.2 MISP 


As outlined in the literature [7, 8], MISP takes the lead in the platforms’ race, as the 
most suitable platform for the purposes of the CT] life-cycle support. Thus, it is the 
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platform of choice for the CTI management and sharing of INTIME. Specifically, 
the Data Management and Sharing module uses, extends and enhances MISP, in 
order to enrich its storing capabilities with additional context. In the rest of this 
section, we will describe the essential details of MISP, that regard its (i) data model, 
(ii) CTI sharing properties and features, and (iii) its additional features. 

The main objective of the MISP data model is to have a minimum viable format, 
which can be extended, according to the needs of additional complexity, instead of 
trying to capture all possible future requirements in advance. A new entry in MISP 
is called an event object, which is defined by a set of characteristics, along with 
all kinds of respective descriptions for indicators, including attachments. These 
characteristics are called attributes in MISP, and they provide all useful information 
to the event, such as an IoC date, threat level, comments, organization that created 
it, and so on. Attributes are mainly described by two fields: category and type. The 
main difference is that the category field describes what the attribute represents, such 
as network activity, financial fraud, while the zype field describes how the attribute 
represents the chosen category. For example, an attribute type might be a checksum, 
a filename, a hostname, an IP-address, and so on. The actual payload of the attribute 
is stored in the valve field. 

Any CTI artifact, such as a CVE ID of a vulnerability, is stored in the MISP 
database in the form of attributes. Multiple attributes can be grouped to form an 
object, which forms a bigger CTI artifact, like a vulnerability report. Both attributes 
and objects must be attached to events, which basically serve as the records of the 
artifacts’ storage. Finally, MISP enables an event to be correlated with other events, 
through matching techniques over their attributes. Each correlation that may occur 
between events serves as a bond, which also indicates the matching attribute. In 
Figure 3.6, we present an abstract overview of the database schema part, which is 
used for storing the CTI. 

Specifically: 


e The events table is a meta-structure scheme, where attributes, objects and 
meta-data are embedded to compose a sufficient set of indicators, that is able 
to describe a specific case, like a vulnerability report. An event can be com- 
posed from an incident, a security analysis report or a specific threat actor 
analysis. The meaning of an event derives solely from the information embed- 
ded within it. In our case, one event is a collection of objects that are used to 
describe the CTI artifacts. 

© Objects serve as a contextual bond between a list of attributes within an 
event. Their main purpose is to describe more complex structures than can be 
described by a single attribute. Each object is created using an Object Template 
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Figure 3.6. MISP database schema abstracted overview. 


and carries the meta-data of the template used for its creation within. Objects 
belong to a meta-category and are defined by a name. 

e Attributes are used to describe the indicators and contextual data of an event. 
The main information contained in an attribute is formed by category-type- 
value triplets, where the category and type give meaning and context to 
the value. Through the various category-type combinations, a wide range of 
information can be conveyed. 

© Correlations serve as a bonding system between the stored events. Their main 
purpose is to describe any artifacts’ matching that may have occurred between 
the events through the MISP Correlation Engine. 


With regard to the sharing model of MISP, there are two main aspects. First, 
MISP enables its users to select the sharing level of the information stored in the 
MISP DB. For example, the sharer can disseminate the information at hand with a 
specific organization, a community of organizations, interconnected communities, 
all participants of MISP, or even define a sharing group manually. The next main 
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aspect of MISP, is the proposals feature. While the modification of events is only 
permitted to member of the creating organization, proposals allow users to make 
suggestions for changes to an event, created by another organization. A proposal 
is reported back to the original creator of the event, who may accept the change 
or discard it. Then, the outcome of the creator’s decision will be propagated to 
all interconnected instances. An example of this feature is the reporting of false 
positives to the event creator, asking for an error correction. Finally, MISP is able 
to provide any information stored in its database, in both human and machine- 
readable formats, and allows users to access it either through a GUI or via a REST 
API, with respect to the aforementioned aspects of its sharing model. 
Furthermore, MISP provides various complementary features, including: 


PyMISP”: A python library for the implementation of MISP API. PyMISP provides 
users with fetching, adding, updating, deleting and searching capabilities over the 
stored events/attributes or samples. 


The free-text import tool. It enables users to copy and paste raw data (in free-text 
format) into a single data field, that through a heuristic algorithm matches the 
attributes. The resulting attributes are then presented to the user who proceeds to 
validate the findings. 


MISP tagging mechanism. \t enables users to define customizable tags, through 
which they can later filter the events and classify the encompassed information. 
Furthermore, the tags can also be exportable, hence allowing the reusing of the 
same tags from other MISP instances. 


MISP taxonomies. A taxonomy is a triplet of tags, which is described by a names- 
pace, a predicate and a value. Through the utilization of taxonomies’ repository, 
organizations have a common format for describing incidents. Furthermore, if the 
predefined taxonomies do not fit the description of an event, users can define their 
own. 


MISP instances syncing. MISP is provided with a synchronization protocol, which 
supports four main features; pull, push, cherry-picking, and the feed system. The 
pull feature allows a MISP instance to discover available and accessible events on 
a connected instance and download any new or modified events. The push mech- 
anism allows a MISP instance to convert events to a JSON format that is trans- 
ferable to remote instances. The cherry-picking feature is an alternative to the pull 
method, which allows users to decide which events should be pulled to the local 
instance. Finally, the feed mechanism allows a MISP instance to generate a dump 


12. https://pymisp.readthedocs.io/en/latest/ 
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of JSON files, which derive from a selection of events that an organization was 
to publish. Then, the output can be served via a web server, through which other 
MISP instances can access and retrieve the contents via the UI, similarly to the 
cherry-picking. 


MISP sightings. MISP provides a sighting system, which allows users to react on 
attributes on an event. Originally, it was designed to provide an easy method for 
users to verify a given attribute, hence raising its credibility. Later, sightings have 
been improved to provide a method to signal false positives, but also to give an 
expiration date for some attributes. As stated previously, MISP Sightings are a way 
for users to state that they have seen or noticed an attribute and also confirm its 
validity. An attribute may be spotted several times by the same user, and thus a 
single user can use sighting several times on a single attribute. Sometimes, some 
attributes may be considered as false positives, and similarly to the previous case, 
users can signal a single attribute as a false positive several times. There is also the 
case of some attributes being valid for a certain period of time (for instance, in case 
of a phishing campaign that is assumed to be up for only one week). In this case, 
users can assign an expiration date to an attribute, but this time, there can only be 
one valid expiration date per organization of the MISP instance. 


A particularly interesting additional feature of MISP is its correlation engine, 
which encompasses all the correlations between attributes and more advanced cor- 
relations like fuzzy hashing correlation (e.g., ssdeep) or CIDR block matching. Cor- 
relations can be both enabled or disabled, for each event per attribute. The value 
field of the attribute is the main payload of the attributes, which is described by 
the category and type columns, and it is used by the correlation engine to find rela- 
tions between events. Specifically, after each event creation, the correlation engine of 
MISP scans through the database for matches of the event’s correlatable attributes, 
with regard to their category and type. For each match, MISP proceeds to store two 
correlation entries in the database; one that points from the recently created event, 
to the previously stored, and one that points to the recently created event, from the 
previously stored, through their unique event IDs, along with their corresponding 
attribute unique IDs. 


3.5.5 MISP Implementation and Customization 


To fully accommodate MISP to our needs, we make use of the platform's provided 
tools to define custom objects that are able to fully encompass the CTI artifacts 
of the monitored sources. To best describe the artifacts that result from the pars- 
ing procedure of our system, we need to store them in MISP in the most suitable 


Data Management and Sharing 45 


objects; vulnerability? and weakness.“ Additionally, MISP provides a method for 
creating custom MISP objects, which we use to create two custom objects for our 
component; namely, the vuldb-vulnerability and expdb-poc objects, which enrich the 
attributes of vulnerability and exploit-poc'’ objects respectively. Finally, we created 
one additional custom object (crawled_obj), that is able to encapsulate any possible 
artifact deriving from the Crawling and Social Media Monitoring submodules, as 
they derive from the Data Analysis and the CTI Extraction tasks. 

Vulnerability objects describe CVEs, which refers to published, unpublished, or 
under review vulnerabilities for software, equipment or hardware. Specifically, vul- 
nerability objects are able to describe CVE entries, with attributes that regard pub- 
lication/modification dates, references, vulnerable configurations (in the form of 
CPEs), description and summary of the vulnerability, CVSS metrics, and of course, 
the CVE ID. 

Weakness objects describe CWEs which refer to usable, incomplete, draft or dep- 
recated weaknesses for software, equipment or hardware. CWE serves as acommon 
language, a measuring technique for security tools, and as a baseline for weakness 
identification, mitigation, and prevention efforts. Such objects contain attributes 
that describe the corresponding CWEs, such as description, name, and status of 
the weakness, and the CWE ID. 

The vuldb-vulnerability object is an enriched version of the vulnerability object, 
for CVEs. Particularly, it provides all proper attributes to store supplementary CTI 
parsed from vulnerability-oriented sources, such as the price estimations, CVSS 
strings from external sources (NVD, Vendor, Researcher), and exploitability and 
remediation statuses. 

The expdb-poc object is a differentiated version of the exploit-poc? object, describ- 
ing a proof-of-concept or an exploit of a vulnerability. This object has often a 
relationship with a CVE entry, via a CVE ID reference. The difference between 
expdb-poc and exploit-poc is that we created a credit field for expdb-poc. Further- 
more, instead of downloading and storing all exploit proof-of-concepts, we point 
towards the link of the PoC raw code, through references. 

The crawled_obj object describes CTI that may result from the Crawling and 
Social Media Monitoring submodules, through the Data Analysis and CTI Extrac- 
tion procedures. First, the object stores several attributes that refer to meta-data 
about the crawling, such as the crawled document's id, discovery timestamp, title, 
raw text, and source URL, along with their corresponding MD5 hashes. Additionally, 


13. https://www.misp- project.org/objects.html#_vulnerability 
14. https://www.misp- project.org/objects.html#_weakness 
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it stores crawling meta-data like the id the type of the crawler that has discovered 
the document, a relevance score assigned to the document by the Content Ranking 
submodule, and a highlight identified by the CTT Extraction submodule. Then, the 
rest of the CTI artifacts deriving from the CTI Extraction submodule are stored 
in the corresponding fields of the defined object, and they may be vulnerable con- 
figurations, CVEs, CWEs, organizations, products, versions and possible CPEs, CVSS 
metrics, files, IPs, commands, functions, configs, money values, dates and timestamps. 

Finally, all MISP Objects that were used in our system contain a credit field, 
which we used to store the source of the parsed CTI, using unique string identifiers 
for each source. 


3.5.4 Component Functionality 


In this section we describe the Data Management and Sharing component’s func- 
tionality. Specifically, we describe the source monitoring procedure, which is respon- 
sible for periodically gathering CTI from our monitored sources. Then, we describe 
the data management procedure, which we particularly designed to (i) structure 
incoming CTI into the suitable objects (object structuring), (ii) check whether any 
incoming CTI is indexed by our component or not (event lookup), (iii) cluster 
objects into the corresponding CTI entries (event creation), (iii) manage updates and 
modifications of the stored CTI (event modification). Finally, we present the MISP 
functionalities implemented for the intercorrelation procedure of the indexed CTI 
(events correlations), and the CTI sharing and reviewing. 


Source Monitoring. During this phase, our system uses the Feed Monitoring and 
the Targeted Web Scraping modules, in order to extract the encompassed CTI. The 
monitored sources can be divided in two categories. The first category contains 
sources that provide structured data feeds (in JSON and XML formats) of their 
information collections. For this category, we use the Feed Monitoring module, 
which proceeds to extract CTI through JSON/XML parsing techniques. The rest 
of the monitored sources belong to the second category, which refers to sources 
that do not provide data feeds, but expose the contents of their database on web- 
based interfaces, in a structured manner. For this category we implemented stan- 
dard scraping techniques like XPath querying and HTML parsing, through the 
Targeted Web Scraping module. The source monitoring procedure is executed with 
an adjustable monitoring period, which can be instructed in the Monitoring Sched- 
uler module. 


Object Structuring. After extracting all actionable CTI from the parsing proce- 
dures described in the previous section, our component proceeds to structure it in 
the format of the suitable MISP objects, in accordance to the objects described in 
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Section 3.5.3, with the use of the PyMISP library (as presented in Figure 3.5). To 
achieve that, the component generates the MISP objects in JSON format, as triplets 
of the attributes’ field, value, comment, with the values extracted from the parsing 
phase. The comment field is used to store enriching information to the value. In 
example, declaring the source of a reference, whether it is from the affected vendor, 
or from another vulnerability notes’ source. 


Event Management. The event management phase executes in parallel to the 
source crawling and parsing phase as described in the previous section. What actu- 
ally happens during this phase, is either the creation of new events, each time new 
CTI arrives to the Data Management and Sharing component, or the modification 
of previously stored events, due to updated CTI artifacts. This is also, the phase 
during which the clustering of the gathered CTI occurs. In the following sections, 
we describe the process followed in order to achieve that. 


Event Lookup. First of all, in order to determine whether the CTI which arrived, is 
uncatalogued by the system or not, our component queries the MISP instance, with 
the CTT’s unique identifier at hand. So, through the use of PyMISP, the component 
queries MISP, for any event that regards the currently parsed CTT’s unique ID, by 
looking into the events’ info field, which is used to store such identifiers. The result 
of the query can lead to two possible outcomes; (a) the parsed CTI ID is not already 
stored, and therefore a new event should be created, or (b) the parsed CTI ID exists, 
and therefore one or more existing events should be modified. For the second case, 
the component returns the corresponding MISP Event in JSON format, through 
PyMISP, and it also temporarily stores the corresponding MISP Event ID, as it is 
stored in the MISP instance. 


Event Creation. If the parsed CTI is unindexed, then through PyMISP, the com- 
ponent follows a three-step approach, to catalogue it. First, it generates a new event 
in the MISP instance, setting the event’s info field, to match the parsed unique CTI 
ID. Then, it generates the required MISP Objects (with regard to the specifications 
of each monitored source), from the constructed JSON structures of the object 
structuring phase. Additionally, the generated objects’ validity is checked both 
locally, through the PyMISP library’s objects’ definitions, and externally, through a 
PyMISP request of the MISP instance objects’ definitions. Both definitions must 
be the same for this step to succeed, and they are expressed in the form of JSON 
files, in the PyMISP library's files and the MISP instance’s files. Finally, it attaches 
the generated MISP Objects to the event that was generated in the first step, on 
the MISP instance. An overview of the generated event through the MISP UI is 
presented in Figure 3.7(c). 
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Figure 3.7. MISP events view; (a) events’ correlation graph, (b) events’ timeline, (c) 
events’ objects and attributes view. 


Event Modification. An event modification may occur in two cases; the system 
parsed CTI which is (a) unstored by the component, but it regards an existing 
CVE ID entry (which happens due to overlapping CTI from different sources), (b) 
an updated version of previously stored CTI. Any modification that occurs during 
this phase, makes use of the previously stored MISP Event, which derives from the 
Event lookup phase, through its CTI ID, that points on the MISP instance, through 
PyMISP, the event that is going to be modified. 

Regarding the first case, the component checks the credit field of each object 
within the events at hand. If there is no match, it proceeds to generate the required 
MISP Objects, and then it attaches them to the existing event. To achieve that, the 
component checks the credit field of each object within the event at hand. 

For the second case, similarly to the previous case, the component generates the 
corresponding MISP Objects and checks the credit field of each object within the 
event at hand. If there is a match, it proceeds to check the modified attribute of the 
matching object, which regards the modification date of the CTI encompassed. If 
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the modification date of the newly parsed CTI is more recent than the previously 
stored one, the component deletes the stored object, and proceeds to attach the 
newly generated object, to the MISP Event at hand. 

Any modifications or additions of CTI artifacts in the MISP events, appear on 
the MISP event view, through the events’ timeline (Figure 3.7(b)). 


Events Correlations. Finally, it is important to note that, after each event 
creation/modification, the component proceeds to recalculate the correlations, 
through the MISP Correlation Engine (presented in Section 3.5.2), since there 
is a possibility that the newly stored CTI may regard the same affected products, 
as other events. After this process, the event at hand points to all related events, as 
depicted in Figure 3.7(a). 


CTI Sharing and Reviewing. In parallel to the gathering of all publicly avail- 
able CTI from the monitored sources, our system is also able to proceed with the 
CTI sharing and reviewing phase. The sharing of the encompassed CTI may occur 
in two ways. The first, is to share CTI through the sharing features of MISP, as 
described in Section 3.5.2. The second method, is to query the component through 
the provided MISP REST API, using the required authorization credentials. In the 
following section, we provide a detailed overview of how this may be achieved. 


MISP REST API: RESTful Searches. As mentioned earlier, MISP provides the 
option to search its embedded database, via the provided REST API. Moreover, 
it is able to export CTI in various CTI sharing standards such as JSON, XML, 
OpenIOC, Suricata, Snort, STIX, and more. Thus, it is possible to query the MISP 
REST API, for information regarding a specific entry, and receive a response in 
the requested format. For these purposes, there are two REST endpoints; one that 
regards information on event level, and one for the attribute level. In the first case, 
a user may retrieve all related CTI to the posed query, while in the second case, the 
user may retrieve all related attributes of the stored CTI, which match the posed 
query (e.g., a vulnerability’s description). Both of these endpoints use the POST 
HTTP method to query the MISP REST API. Additionally, both endpoints enable 
users to pose constraints to the requested CTI, such as dates, values (which may also 
contain wildcards with the use of the “%” character), pagination of the results, and 
more. Finally, MISP provides an automation functionality, which is designed to 
automatically feed other tools and systems with the data of the MISP repository. 
To make this functionality available for automated tools, an authentication key is 
used. Thus, in order to gain access to the REST API of MISP, the users should 
include their uniquely generated key (as a header in the POST request). 
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Figure 3.8. MISP Sightings mechanism as provided in the MISP UI on the events view. 


CTI Reviewing Through MISP Sightings. To this end, for the reviewing of the 
encompassed CTI, the proposed component utilizes the MISP Sightings mecha- 
nism (described in Section 3.5.2), which allows users to declare whether an artifact 
is true positive or false positive, with regard to the vulnerabilities and exploits stored 
in MISP. The sightings mechanism for the reviewing of the stored CTI, can be used 
through the MISP UI on the events view, as highlighted in Figure 3.8. 


3.6 Conclusions 


In this chapter, we focused on facilitating the CTI life-cycle, by utilizing the appro- 
priate open-source tools, for automating the CTI gathering and sharing tasks. We 
have presented INTIME, a solution that provides an end-to-end CTI management 
platform that is able to support the collection, analysis, leveraging and sharing of 
CTI via an integrated, extensible framework. We presented the architectural solu- 
tions behind the proposed system, discussed the individual module technologies 
and provided details on the module orchestration. 
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Securing the constantly evolving IoT threat landscape is a challenging problem, 
with severe consequences when not tackled appropriately. In response to that chal- 
lenge, the field of moving-target defense has developed, to address these threats 
by utilizing game-theoretic approaches to respond to them while maintaining a 
high level of availability. This work presents an implementation of an intrusion 
response system, which uses a Bayesian attack graph to model the complex state 
of the network and its hosts, and a partially observable Markov decision process 
to choose optimal mitigation actions. In order to cope with novel and unknown 
network attacks, like zero-day exploits, an alert management policy was added to 
focus the POMDP on the current state of the network and provide short-term 
mitigation actions. Finally, the system was evaluated against five scenarios (Mirai, 
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Zeus, zero-day, 10 malicious traffic replays, and BlackEnergy) executed in a simu- 
lated SOHO environment. Evaluation results showed its high effectiveness against 
traditional threats, and a slight increase in effectiveness against novel threats. 


4.1 Introduction 


In recent years, the constantly evolving threat landscape has seen an increasing 
number of cyber-attacks [1], with network-level attacks, botnets, and malicious 
software becoming more and more sophisticated over time. Furthermore, these 
well-understood threats were joined by zero-day attacks (i.e., the exploitation of 
undisclosed and unpatched vulnerabilities) which by their nature pose a greater 
threat to the security of computing networks due to the lack of information about 
them. 

The detection of such threats is not trivial, because defenders often find them- 
selves evaluating the security state of their networks through noisy information 
sources—like log servers (from which the distinction of security events from a tor- 
rent of insignificant ones may be difficult) or noisy alerts from intrusion detec- 
tion systems (i.e., with an unacceptably high number of false positives/negatives). 
Current mitigation techniques, often relying on human intervention (i.e., incident 
response teams) or on existing network and host-based controls (e.g., firewalls or 
antimalware solutions), have proven to be inadequate in terms of coverage. More- 
over, such solutions usually do not take service availability into consideration before 
acting—for instance, inaccurate firewall rule application during an attack may cause 
more damage than the attack itself, as the availability of critical systems or resources 
may be severely harmed. In addition, antimalware solutions often fail to protect 
against a large number of unknown or recent threats, while also requiring human 
interaction to apply mitigation measures. 

More advanced defensive solutions have been developed with a twofold aim: 
to hinder the progression of an attack, and to gain a better understanding of the 
attacker’s tools and methods. These solutions often interact with the attacker by 
changing the structure of the network, or present more attractive targets to distract 
from other network systems. For example, honeypots achieve this by deploying 
decoy vulnerable services, while honeynets deploy attractive-looking systems as red 
herrings to distract the attacker. However, even these fail against skilled attackers 
which are able to identify and avoid them. 

Expanding on the idea of interacting with the attacker, moving-target defense 
(MTD) techniques were developed to optimally respond to adapting and complex 
threats. The main objective of MTD techniques is to affect changes to the net- 
work structure (or attack surface) in order to minimize the attacker’s reconnaissance 
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ability, as well as to respond to threats while maintaining an acceptable level of 
service availability. The current landscape in game-theoretic MTD approaches 
[2-6] is quite promising but displays contrasting approaches in terms of attack 
modeling, with some of the works showing inefficiencies in either time efficiency, 
adaptability, or in the response selection options. Furthermore, many works assess 
their models mostly through simulation, which presents limitations regarding their 
real-world applicability. In an attempt to provide full coverage on possible attack 
scenarios, related works have a slow response time or employ inaccurate attack mod- 
eling methods when matching security threats to a variety of applicable environ- 
ments (e.g., smart homes). In addition, most game-theoretic approaches do not 
handle the alert matching process, leading to inaccurate modeling of the network 
state. Although these approaches are developed to provide optimal responses in 
the long-term, without considering short-term responses, most common networks 
threats are unsuitably addressed. 

Throughout the years, there was motivation to automate the attack mitigation 
process which led to the development of intrusion response systems (IRS). Initial 
attempts implemented static mapping between detected threats and available coun- 
termeasures [7] but lacked flexibility. This work presents the implementation of an 
IRS which leverages core functionalities of various graphical network security mod- 
els (GNSM) to present a lightweight and efficient template for the application of 
decision-making processes. Also, the implementation of an effective method for 
calculating optimal short-term responses, so as to deal with momentary threats 
and zero-day vulnerabilities in internet of things (IoT) environments, will also be 
presented and evaluated against realistic attack scenarios in a simulated computer 
network. 

The chapter is organized as follows: Section 4.2 presents the necessary back- 
ground on MTD techniques and other related works; our IRS implementation 
will be discussed in Section 4.3. Two characteristic attack scenarios for loT envi- 
ronments (namely, the Mirai botnet and a zero-day scenario) will be discussed in 
Section 4.4, while the experimental setup will be presented in Section 4.5. Finally, 
the evaluation results of the IRS will be presented in Section 4.6, while concluding 
remarks and future work are provided in Section 4.7. 


4.2 Background and Related Work 


MTD is a broad field encompassing techniques and mechanisms aiming to deceive 
an attacker by changing the network topology (by implementing shifting mecha- 
nisms) and utilizing any available event-based information to monitor malicious 
activity in the network. Lei et al. in [8] explain that MTD can be studied by 


Background and Related Work 55 


elaborating decisive elements that can measure the effectiveness of the implemented 
mechanisms. 

Sengputa et al. in [9] indicate that MTD techniques are most advantageous when 
their implemented mechanisms are not deterministic, for the reason that attackers 
will ultimately be able to anticipate future shifting actions and calculate their attack 
strategies accordingly. The authors further discuss the implementation of MTD 
techniques, focusing on the network and application layers of the open systems inter- 
connection (OSI) model. They note that MTD middlebox implementations, which 
take advantage of existing network devices used to manipulate network traffic (e.g., 
proxies, firewalls), are problematic due to their static nature and may even disclose 
information about the network to the attacker [10]. For that reason, they explain 
how advanced networking technologies, such as software-defined networking (SDN) 
and network function virtualization (NFV), can be used to add dynamicity to the 
MTD techniques. With the former technology, SDN, being the preferred approach 
in the area of MTD as a more scalable and effective solution, in addition to provid- 
ing an optimized method for network mapping and multi-stage attack protection. 

Cho et al. in [11] distinguish three broad MTD approaches: (a) game-theoretic, 
(b) genetic algorithm based, and (c) machine learning based. While all three are 
promising, their work focuses on a game-theoretic approach as it provides consid- 
erable advantages in terms of implementation flexibility, realistic modeling of the 
environments, and incorporation of diversified attack scenarios. 

Zonouz et al. in [12], propose the usage of a competitive Markov decision process 
(CMDP) which is applied on a tree security model as an automated response and 
recovery engine that preserves availability. This approach presents a holistic solution 
that models the attacker as a rather intelligent entity, which avoids actions with a 
low payoff, but is lacking in scaling management and response time. 

Shameli-Sendi et al. in [6] showcase an automated and interactive IRS which 
dynamically evaluates response actions with respect to network dependencies and 
critical processes, by constructing a static but flexible GNSM. The proposed model 
blindly triggers responses from the received alerts, which are evaluated according 
to the same security metrics (as defined for assets) to show, upon an attack, the 
negative impact of a response on different defense points. The limitation is that a 
response’s positive impact computation is static and the security state is not updated 
when a response is applied. However, an accurate evaluation of responses is pro- 
vided throughout the response process as their selection takes into account the 
attack damage cost, confidence level of the attacker and the probability of attack 
taking place. 

Miehling et al. in [3], develop an autonomous system for the defense of 
attacked networks based on a Bayesian attack graph (BAG). A probabilistic model is 
implemented in order to capture the attacker’s behavior when progressing through 
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the network. In their model, the defender is a partial observer, as the attacker’s 
strategy is unknown, who tries to block the attacker’s progress through the net- 
work by employing mitigation actions concerning network services. The authors 
describe this problem as a discrete time partially observable Markov decision process 
(POMDP) and consider both network attributes (services, vulnerabilities, etc.) and 
their representation in the GNSM (attack paths, belief state, etc.) of the decision 
problem, so as to successfully predict the future actions of the attacker. In [4], the 
authors present an IRS which takes advantage of dependency attack graphs so as to 
model a POMDP in a similar manner to [3]. This newer dynamic model is able to 
handle false alarms and quantify the attacker’s progression while calculating long- 
term effective responses by simulating the effectiveness of decisions with a partially 
observable Monte-Carlo planning (POMCP) algorithm. 


4.3 System Modelling 


This section presents our proposed modeling for addressing current threats in smart 
homes, smart offices/home offices (SOHO) and IoT networks by taking advantage 
of graph-based models and their unique characteristics, so as to form a versatile 
framework for the application of MTD techniques. The IRS implementation is 
divided into two sub-components, as seen in Figure 4.1, the attack graph generator 
and the decision-making engine. 


The high-level functionality of the IRS is as follows: 


e Initially, the IRS receives information about the network topology from the 
gateways network discovery module, including: host IP addresses, routing 
tables, subnetwork definitions, and any discovered vulnerabilities. 


Figure 4.1. IRS high-level architecture. 
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e Then, the attack graph engine processes this information to generate the base 
GNSM, to perform risk analysis, and to pre-calculate all possible mitigation 
actions (firewall rules) by the response generator. 

e This information is then forwarded to the decision-making engine, with the 
GNSM forming the initial state of the game-theoretic model and the pre- 
calculated mitigation actions being the defender’s actions. 

e Finally, network alerts generated by the gateway’s intrusion detection system 
(IDS) are mapped onto the GNSM, which is analyzed by the response selection 
process and the appropriate mitigation action is selected. 


4.31 Attack Graphs 


GNSMs are widely used to model the security state of a network (or a host, depend- 
ing on their application) using directed graphs, to identify possible attack paths 
(sequences of actions) an attacker may take to reach a desirable state (goal condi- 
tion), and to perform more complex methods of risk analysis. These paths describe 
network states with nodes and state transitions with directed edges. These nodes 
are usually conceptualized to be either preconditions (capabilities an attacker must 
have to proceed further) or postconditions (capabilities an attacker can obtain, as 
long as their preconditions are met); capabilities include: acquired privileges, exist- 
ing vulnerabilities, network attributes, or actions, among others. There are two 
major categories of GNSMs: attack trees and attack graphs; with the former [13] 
describing a single goal condition and every action required to reach it, and the lat- 
ter describing multi-stage attacks that are not restricted to a single goal condition 
focusing instead on the attacker’s actions rather than on the consequences of those 
actions. 

Various attack graph-based security models have been proposed through the 
years with the most important being state attack graphs (SAG) [14], logical attack 
graphs (LAG) [15] and Bayesian attack graphs (BAG) [16]. While SAGs are better 
in terms of applicability, they scale exponentially in an attempt to cover all possible 
combinations of the attacker’s moves, by not taking into account the generation of 
duplicate attack paths. LAGs describe logical dependencies among attack goals by 
employing nodes (facts) as logical statements and are considered a scalable solution 
for attack graph generation. BAGs are directed acyclic graphs where nodes represent 
random variables and edges depict conditional dependencies between node pairs; 
they are mainly used to conduct probabilistic risk analysis on networks character- 
ized by rapid changes in their topology or host attributes. 

The development of our IRS graphical model is based on the Multi-host, Multi- 
stage Vulnerability Analysis Language (MulVAL), a widely-used framework for pro- 
ducing LAGs in large-scale networks. Its logical dependencies describe how an 
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attack can be performed by considering logical facts as actions, translated into Dat- 
alog derivation sequences. Information about the network and discovered vulnera- 
bilities are translated to Datalog tuples and processed by its internal XSB reasoning 
engine to produce the LAG. This model contains three node types: OR & LEAF 
nodes describing states of network devices (security conditions), and AND nodes 
which describe conjunctive relations between OR & LEAF nodes (exploits). Edges 
in this model connect preconditions to postconditions through exploit nodes. In 
the IRS, the MulVAL-generated LAG is converted to a BAG by conducting cycle 
elimination and by associating common vulnerability scoring system (CVSS) metrics 
with its edges. 


4.3.2 Response Generation 


Actionable remediation actions, which will be used by the decision-making engine 
and the POMCP model to modify the network topology depicted by the GNSM, 
are pre-calculated by the response generation submodule. These are firewall rules 
that change the inter-connectivity of hosts, both in and across sub networks, for 
the purpose of blocking access to vulnerable services or hosts. 

The algorithm starts by selecting a node to be blocked (usually all exploits con- 
tained in the BAG) and, using depth-first search (DFS), explores the corresponding 
subgraph until LEAF nodes are reached. During this process, nodes are sequen- 
tially examined and all that contain enough information and depict access states, 
are taken into account for the creation of firewall rules and thus they are inserted 
into a tree structure. Additionally, each visited OR node is added to the tree as an 
AND operator (as every child must be invalidated to invalidate an OR node) and 
each visited AND node is added to the tree as an OR operator (as it takes only 
one child to be invalidated to invalidate an AND node). All paths that at the end 
do not represent such states, are terminated with a NULL node. MulVALs Data- 
log rules are able to accurately describe detected services, as well as service-related 
information such as ports and IP addresses. Furthermore, all tree paths that are ter- 
minated with NULL nodes are removed from the tree to make processing easier, 
and the remaining paths are then collapsed to remove redundant operator sequences 


(see Figure 4.2). The remaining tree represents the solution in a disjunctive normal 
form (DNF). 


(RıN -O Re) U (R1 N-O Rp) UU (RIN Rm) 


To manage the uncertainty that comes with unknown attacks, firewall rules 
blocking all services of each and every network host (global rules) are also gen- 
erated. Although this solution is not considered optimal in terms of availability, 
there are multiple network-level attacks causing rapid changes to the network which 
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OR OR 
1D: 31 ID: 12 
~ 
TCP: 22 TCP: 22 
10.0.10.105 -> 10.0.10.103 10.0.10.103-> 10.0.10.107 
ID = 57 ID: 57 
TCP: 22 TCP: 22 
10.0.10.106 -> 10.0.10.103 10.0.10.105 -> 10.0.10.107 
ID:39 ID : 57 


Figure 4.2. Sample tree from the response generation process. 


the attack graph engine is not able to depict in the BAG in real-time. For exam- 
ple, attacks which communicate through dynamically assigned ports on the tar- 
geted network host—a common behavior to malware threats like Zeus which uses 
the OS-provided API to open a connection to its command and control (C&C) 
server [17], resulting in each communication attempt happening over a different 
port. 

Finally, every solution is associated with a list of affected BAG nodes (that is, 
the nodes that will be considered invalidated/removed upon deployment of the 
rule) which is used by the decision-making engine to determine the impact of each 
solution, and chose the solution that optimally covers its belief about which nodes 
are believed to be exploited by the attacker. 


4.3.3 Decision-making Process 


The primary aim of IRSs decision-making engine is the choice of optimal mitiga- 
tion actions, from the pre-calculated set received by the response generation sub- 
module, in response to sophisticated network attacks. The game-theoretic model 
implemented is based on the POMDP model presented in [4] executed on top of 
the BAG generated by the attack graph engine. 
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This model describes a game between an attacker and a defender who is a partial 
observer, meaning that the attacker’s strategy is unknown by the defender. In this 
game, the attacker aims to exploit vulnerabilities or execute other network attacks 
to progress through the network and reach a goal condition. The defender aims to 
block the attacker’s progression through the network by selecting the proper miti- 
gation action based on his belief about the network state (belief probabilities on the 
BAG) and the attacker type (a presumption on the attacker’s strategy). Three types 
of attacker behavior are modeled, representing preset notions about the attacker’s 
true strategy (aggression, knowledge level, and stealthiness). 

Probabilistic metrics on exploitation-oriented decisions and actions, such as 
the probability of exploitation attempt and probability of exploitation success, 
are assigned by the risk analysis process performed by the attack graph engine on 
the base BAG. The execution of the POMDP model is performed in real-time, 
with each round (discrete time step) leveraging information received by the gate- 
way's IDS to observe the attacker’s actions on the network. This observation is the 
matching of the received alerts on the BAG’s nodes (security states) which are con- 
sidered to be reached by the attacker. Moreover, the decision-making process is 
based on a belief matrix which is the joint distribution over the security states and 
attacker types. The belief is updated every round in accordance to the defender’s 
observation and is kept as a metric which bethinks in a recursive manner all pre- 
vious decisions. All applicable solutions are pre-computed, allowing the optimal 
and fast execution of the required actions. The cost is the lowest when a firewall 
rule (or a set of rules, depending the circumstances) covers the widest node area in 


the BAG. 


4.3.4 Further Adjustments 


Originally, [4] describes a specific procedure regarding the selection of alerts to be 
triggered. Alerts are considered valid when exploitation-related preconditions are 
compromised. At the same time, the original model ignores any alerts that have 
corresponding postconditions compromised and then samples random alerts, fil- 
tered using the binomial distribution, according to their work. 

In many occasions, the structure of the underlying GNSM significantly affects 
the attackers development in the modeled network when alerts are received that 
way. Occasionally, the graph’s goal conditions may be reached instantly or some- 
times never at all, resulting in the absence of a mitigation action. To combat 
this, the implemented POMDP model employs an alert management policy, so 
that unknown attacks can be mitigated alongside traditional network attacks and 
exploitation attempts. This policy operates in two modes: strict and agile. For both 
modes, three sets of exploits (AND nodes of the BAG). The first is defined as the 
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set of activated exploits Eac; the second is defined as the set of available exploits 
Eav, which includes exploits whose preconditions are compromised and whose 
belief exceeds a threshold; while the last is defined as the set of blocked exploits 
Ep), which includes exploits that have been blocked by previous mitigation actions. 
Consequently, we define strict policy Py as: 


Ps = (Epi)© N Eac 
and agile policy P4 as: 
Pa = Ea N Eac 


Depending on the selected policy, alerts are matched to one of the aforemen- 
tioned exploit sets, as long as there is enough available information from the IDS. 
The implemented POMDP model is focused on the current state of the network, 
allowing the IRS to better respond to attacks by providing short-term mitigation 
action responses when compared with other works, as its applicability does not 
extend to infinite horizon optimal planning. Attack paths depicted in the BAG are 
built with less actions in comparison to complex networks, thus it is not necessary 
to develop a system that attempts to think ahead of the adversary. To that end, 
the system’s complexity is reduced by restricting the POMDP model to only one 
simulation round. 


4.4 Attack Strategies 


This work aims to address network-level threats and vulnerabilities relevant to loT 
and SOHO environments. The devices of these environments are characterized 
by their variability of their operating systems and embedded technologies, which, 
when paired with the current rapidly evolving computing environment, allows for 
the creation of a multitude of attack vectors. Operations reliability, confidential- 
ity, and availability are among the most important security goals to be considered 
in the context of securing such systems, especially as even moderate security con- 
trols are not implemented neither in host-level or network-level, and as their users 
are not properly educated on how to properly configure and secure them. Thus, 
in the current cyber-threat landscape these ecosystems are prime targets of large- 
scale attacks, including IoT botnets and Trojans. This section presents and analyses 
two characteristic attack scenarios associated with IoT systems and SOHOs, which 
will be further examined in the following sections through real-world scenario 
simulations. 
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4.4.1 The Mirai Botnet 


The first threat of concern is IoT botnets. For evaluation purposes, the Mirai bot- 
net was chosen, as at the peak of its activity became a wake-up call to the secu- 
rity industry [19], with an estimated number of 600,000 systems being infected at 
the peak of its initial breakout [20]. Infected devices became the source of one of 
the most severe cases of distributed denial-of-service (DDoS) attacks of the recent 
past, targeting the French web host OVH with a peak traffic size of 1.1 Tbps [21]. 
The disclosure of its source code, instead of leading to its eradication, significantly 
increased the number of attacks [22] and became the starting point for the creation 
of more resilient variants [19]. 
Mirai is comprised of four components: 


e The Jot executable, which is responsible for the infection through the usage of 
dictionary attacks, using common pairs of usernames and passwords, against 
misconfigured IoT devices. 

e The report server which maintains the database of the botnet, handling incom- 
ing reports for infected devices and acts as one of the two intermediary entities 
between the C&C server and the bot. Bot and report server communication 
is achieved through the Tor network making its detection a challenging task. 

© The C&C server is the central unit, providing a botnet management interface 
to the attacker while allowing the execution of infection and attack com- 
mands. 

e The bader operates as another intermediary entity between the C&C server 
and infected devices, by sending malicious binaries to victims according to 
the servers infection commands. 


The detection of Mirai is highly dependent on the utilized network intrusion 
detection systems (NIDS) for signature-based detection in the transmitted pack- 
ets at the IoT environment. The attack can possibly be detected in three distinct 
actions: (a) during the infection of a new victim, (b) during the DDoS attack, 
and/or (c) during the transmission of a malicious binary between the loader and 
the infected victim. Regarding the DDoS attack, it must be mentioned that Mirai 
is able to use ten attack variations including HTTP flood, SYN flood, UDP flood, 
ACK packet flood, and so on. However, most of them can be easily detected by 
a NIDS. 

During the execution of the Mirai attack scenario, the IDS at the gateway is 
expected to generate a number of alerts about the suspicious traffic, the IRS will 
process them to generate firewall rules to block the suspicious traffic. Depend- 
ing on the alerts, the most suitable response will be determined by the POMDP 
model by formulating a strategy that does not only solve the problem but also 
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considers how every generated action will affect the availability in the SOHO or IoT 
environment. 


4.4.2 Zero-Day Attacks 


Zero-day attacks exploit not yet disclosed and unpatched vulnerabilities, who have 
no available countermeasures or known mitigation actions at the time of exploita- 
tion. Especially in loT environments, the wide variety of communication devices 
(regardless of their operational technology) and their anticipated integration, con- 
stitute a complex and diverse system independent of human intervention—the 
latter resulting in security patches or mechanisms are not always handled as they 
should. As noted by [23], important features required in IoT applications, allow 
access to the entire network when exploited. The same holds with zero-day attacks 
in SOHO environments where vulnerable devices are present [24]. 

Similar to the Mirai scenario, detection of zero-day attacks heavily relies on the 
NIDS and its mode of operation. This type of exploitation is often accompanied 
with suspicious network packet payloads, thus rendering the detection process fea- 
sible to a certain extent. Nonetheless, the zero-day exploitation step does not often 
reflect the attacker’s final goal, but rather the first step of a multi-stage attack (an 
attack path on the BAG). An attacker in this case, may just take advantage of any 
available vulnerabilities and pivot from host to host until the desired goal condition 
is reached. On the other hand, a more sophisticated attacker may take an alterna- 
tive path with respect to speed and feasibility. Zero-day attacks are investigated by 
taking into account future weighted transitions for computing the belief metric of 
the corresponding attack state. Received alerts direct the IRS towards an optimal 
response that is related to the attacker’s state in the graph, in accordance to neigh- 
boring exploitation nodes. 


4.5 Experimental Setup 


The IRS implementation described in previous sections, was evaluated in a realistic 
simulated SOHO environment in which the devices presented in Table 4.1 were 
included. 

Respectively, a number of external devices are located in the WAN, from where 
the SOHO’s gateway is reachable at 172.16.4.36. The Mirai external core com- 
ponents (C&C, loader, and report server) are located in 172.16.4.21, while the 


1. hetps://github.com/budtmo/docker-android 


2. _ https://sourceforge.net/projects/metasploitable 
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Table 4.1. Overview of the SOHO environment. 


Device Name IP address Description 


Gateway 192.168.0.1 In addition to its gateway functionality, it 
hosts a Suricata IDS instance along with the 
network discovery tools. 


IRS 192.168.0.3 & .4 The two halves of the IRS implementation 
(attack graph engine and decision-making 
engine respectively). 


DHCP 192.168.0.7 Dedicated DHCP stand-alone server. 


Android Device 192.168.0.9 Docker-Android Image’ running in an 
Ubuntu virtual machine. 


Windows XP 192.168.0.36 General purpose Windows XP machine acting 
as an attack target (with service pack 3 


installed). 


Windows 7 192.168.0.17 General purpose Windows 7 machine acting 
as an attack target (with service pack 1 


installed). 
Metasploitable 2 192. 168.0.20 An intentionally vulnerable Ubuntu device’ 


designed for remote and local exploit testing. 


BusyBox 192.168.0.21 & .35 A software suite implementing a number of 
basic Unix utilities commonly used on IoT 
embedded devices. Two instances are deployed 
in the same Ubuntu virtual machine as the 
Docker-Android device. 


DDoS target located at 172. 16.4.26. In addition, the Zeus C&C server is located 
at 172.16.4.67—will be further discussed in Section 4.6. 


4.51 The Mirai Attack Scenario 


To further demonstrate the IRS evaluation procedure, the execution of the Mirai 
attack scenario will be presented in detail, while an overview is given in Figure 4.3. 
This attack scenario involves a Mirai-infected BusyBox host inside the SOHO 
network at 192.168.0.21, communicating with the external Mirai components at 
172.16.4.21 to perform a DDoS attack on 172.16.4.26. 

According to [19], the bot normally engages a dictionary attack against TCP 
ports 23 & 2323 (associated with the TELNET protocol) using a list of common 
default username/password pairs to establish a connection and gain shell access. 
This scenario begins with the x86/x64 bot binary being uploaded to the targeted 
BusyBox SOHO host, with the gateway’s IDS and the IRS both operating normally. 
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Figure 4.3. Mirai DDoS attack execution. 


Generated information is reported back to the Mirai report server. At this point, the 
attacker may scan potential targets by sending ARP requests, in order to discover 
the SOHO’s topology. 

Afterwards, the attack begins with the attacker selecting the attack parameters 
and confirming the action. As mentioned in Section 4.4.1, there are a wide set of 
DDoS attacks that the attacker can choose from, in this case the SYN flood attack 
is used. The infected host will start attacking the external machine by repeatedly 
sending SYN packets, in an attempt to open as many TCP connections as possible 
and exhaust the target’s resources. 

In this demonstration, the SOHO is monitored by Suricata’ signature-based 
IDS, thus the mitigation is dependent on the analysis of captured packets that 
pass through the gateway. However, the Mirai bot communicates with server-side 
components through Tor, making the detection process a difficult task. During 
the course of the attack, received alert messages of event_type = alert are 
consumed by the decision-making engine. 

The IDS generated alerts for the three following actions: 


e Target discovery using ARP packets. 


e Attempted infections to LAN devices with a dictionary attack 
(192.168.0.21). 

e SYN flood attack on the external target device (192.168.0.21 — 
172.16.4.26). 


These alerts initiated the IRS decision-making process which resulted in 120 
different security states in the GNSM. In total, one response mitigation action was 


3. https://suricata-ids.org 
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Figure 4.4. Attacker’s beliefs for every security state. 


selected, a global rule blocking all communications originating from the Mirai- 
infected host, that was able to prevent the DDoS attack: 


iptables -A INPUT -s 192.168.0.21 -j DROP 
iptables -A OUTPUT -s 192.168.0.21 -j DROP 


Despite the fact that detection restrictions were encountered, the IRS became 
more certain of the states’ beliefs and the attackers’ beliefs over time, that resulted 
in the persistent block of the infected host located in 192. 168.0.21 with the global 
firewall rule. The attacker type belief throughout the execution of the scenario is 
presented in Figure 4.4. Initial uncertainty about the attacker type can be seen in 
the leftmost part of the graph—because the attackers intentions were not clear 
for the first few rounds. As the attack progressed, the attacker type belief quickly 
approached near-certainty, with the POMDP assuming that the attacker follows the 
behavior assigned to attacker type 1 (the least stealthy of the three). Similarly, the 
defender’s belief on the security states updates with increased certainty. Respectively, 
the belief computation time is displayed in Figure 4.5. 

Upon the application of the firewall rule by the gateway, the bot is restricted 
from further infecting new prospective victims in the SOHO, let alone take part in 
any DDoS attack. On top of that, the positive outcome that came with the previous 
response also prevents the attacker from communicating with the bot. Respectively, 
the bot is prohibited from sending relative reports, back to the report server. 


4.6 IRS Evaluation 


The IRS has been evaluated against five attack scenarios in total. The first one, the 
Mirai scenario, was described in the previous section. The remaining four scenarios 
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Figure 4.5. Belief computation time per security state. 


include: a zero-day attack simulation of the vsftpd exploit“ in the SOHO testbed, 
replayed network-level attacks, and the BlackEnergy & Zeus botnets. 

Specifically, the zero-day vulnerability which has been exploited is a backdoor 
existing in vsftpd v2.3.4 binaries which opens a remote shell on TCP port 6200. 
This specific vulnerability is present in the Metasploitable 2 virtual machine that 
resides inside the SOHO and is triggered during the FTP login process when a 
specifically-formed username is entered. In order to effectively simulate a zero-day 
attack, all information about the vulnerability and its corresponding signatures are 
removed from all cyber-defence components. 

During the testing and evaluation phase, datasets of pcap files have been gener- 
ated from realistic malware traffic in the SOHO environment, including user enu- 
merations, bruteforce attacks and Metasploit exploits. The complete list includes: 
(a) a Java-RMI backdoor, (b) a distcc_exec backdoor, (c) an UnrealIRCD backdoor, 
(d) a Web Tomcat exploit, (e) Ruby DRb code execution, (f) Hydra FTP brute- 
force, (g) Hydra SSH bruteforce, (h) a vstfpd exploit, (i) SMTP User Enumeration 
and (j) a NetBIOS-SSN remote code execution vulnerability. Most of these attacks 
are carried out in the Metasploitable 2 virtual machine. 

The third attack scenario is the Black Energy botnet, whose purpose is to ini- 
tiate remote DDoS attacks. The malware hides its processes in system drivers and 
evades detection through obfuscation techniques. The chosen DDoS attack was a 
SYN flood attack which launched multiple synchronization requests to a SOHO 
external device. Furthermore, the botnet is managed through an external (to the 
SOHO) C&C server which is responsible for issuing commands. The target for 


4. https://www.exploit-db.com/exploits/49757 
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the infection step was the Windows XP machine (192.168.0.36) of the testbed, 
while the transmission of the malicious executable was carried over HTTP. 

The Zeus botnet is the last attack scenario of the evaluation; a widely known 
banking trojan whose main purpose was to capture credentials by web injects and 
keystroke logging, but also had the ability to form botnets. Each Zeus-infected 
host communicates with an external C&C server for periodic reports and when 
requested by the botnet operator, all communications are encrypted using the RC4 
algorithm and happen over HTTP. The botnet has two distinct steps of detec- 
tion [17] that lead to the generation of NIDS alerts: (1) the infection step, where 
the botnet initiates its communication with the C&C, and (2) the establishment 
of a TCP connection with the C&C server, over which the aforementioned reports 
are sent to the attacker. Moreover, in the last step, the attacker is able to execute 
commands on the infected machine (e.g. to capture a screenshot of the desk- 
top, to download and execute other programs, etc.). The Windows 7 machine 
(192. 168.0.17) inside the SOHO is the target of this scenario. 


4.6.1 Configuration Options 


Sixteen configuration options were evaluated on the aforementioned five attack 
scenarios to determine the effectiveness of a number of IRS’s features; the most 
important ones being: 


e The use of CVSS-based or pre-set metrics to calculate the initial host risk 
and the probabilities of exploitation attempt (from OR ? AND nodes) and 
success (from AND — OR nodes). 

e The belief threshold at which exploit nodes (AND) are considered to be 
compromised by the attacker. More specifically this threshold controls which 
nodes will be included in the Egy set (see Section 4.3.4). Initially, all OR & 
AND nodes of the BAG are assigned a belief of 0, while LEAF nodes that 
represent an attacker's ability to execute code on a host are assigned a belief 
of 0.5 and all remaining LEAF nodes are assigned a belief of 1. 

© Whether the response generator will produce both specific (targeting a spe- 
cific port and protocol) and global firewall rules (blocking all connection 
attempts of a host), or whether it will be restricted solely to global firewall 
rules. This option effectively restricts the repertoire of remediation actions 
available to the defender. 

e The alert management policy, strict or agile, which controls the alert match- 
ing process and whether the belief state of the IRS will be overridden by the 
reception of alerts (strict policy) or whether it will be taken into account (agile 
policy). 
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4.6.2 Evaluation Results 


The evaluation of the IRS was performed against all five scenarios with each sce- 
nario repeated sixteen times, one for each configuration option. The results of the 
evaluation are summarized in the following Table 4.2. 

The combination of a high compromised threshold and of the agile alert man- 
agement policy of configurations #6, #8, #14 and #16 made them consistently 


Table 4.2. IRS Evaluation results. 


Configuration Scenario 
= 
o 3 E] 

ÉE 2 % r 
$ ÈE g 5 : 
Sg 3g 2, $ e B 
Be 2 5 Egg a4 8 
#20 G0 = 3 N N ee a 
1 0.5 True Strict vV vV x 9/10% v 
2 Agile v v x 10y% v 
3 n False Strict vV è v v 8/10V Vv 
4 1 Agile v v x 8/l0V Vv 
5 8 1 Tue Sit V V x 0V Vv 
6 È Agile x x x O/10x x 
7 False Strict vV è v x 9/10% Vv 
8 Agile x x x 010x x 
9 0.5 True Strict V [vV x 9/10% v 
10 Agile Vo v x 10y% vy 
ll % False Strict v è v v 10/10% v 
2 Š Agile V vV x 10/10% v 
133 Ê 1 True Stie V V x 0V% V 
4° Agile x x x  O/10x x 
15 False Strict Vo [v *x 9/10% Vv 
16 Agile x x x 010x x 


v and x indicate that the attack was successfully and unsuccessfully mitigated 
respectively. 


* indicates that a specific rule (targeting a specific port and protocol) was used to 
mitigate the attack. 
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unsuccessful. Because the high threshold did not allow for any exploit (AND) 
nodes being considered compromised, as none of the LEAF node beliefs managed 
to exceed it, and the agile policy did not override the belief to consider the nodes 
matched from the IDS alerts. 

For the remaining configurations, regarding: 


© The Mirai, Zeus, and BlackEnergy scenarios: alerts were matched to the BAG 
using the broadest criteria available, which forced the decision-making engine 
to choose global firewall rules no matter the configuration options. That is 
because these scenarios initiate communications over dynamically assigned 
ports, as they all use the operating system’s API which opens a random port 
with each call. These changes are rapid enough that the GNSM generation 
process would have to be repeated several times per minute, which is not 
optimal nor currently feasible, so as to capture these rapidly changing ports 
on the resulting GNSM. 

e The zero-day scenario: (a) for configurations #3, #11, and #15 IDS alerts 
were correctly matched to TCP port 5000 which resulted in the choice of a 
specific rule that blocked communications of all hosts with the router over 
TCP port 5000, and (b) for the remaining configurations alerts were received 
regarding the exploitation of the zero-day but were incorrectly matched to an 
entirely different part of the BAG, leading to the choice of incorrect mitiga- 
tion actions; a result of the lack of information about the exploited vulnera- 
bility. 

e The unsuccessfully mitigated replay scenarios were: (a) the Java RMI back- 
door (failed twice), (b) the Ruby DRb code execution (failed once), (c) the 
SMTP user enumeration (failed five times), (d) the web Tomcat exploit (failed 
twice), and (e) the UnrealIRCD backdoor (failed twice). Again, during the 
execution of these scenarios IDS alerts were received, but as with the zero-day 
scenario, were incorrectly matched to the BAG. 


4.7 Conclusions 


Moving target defense is undoubtedly a field that includes many and different 
implementations addressing the same problem with diverse technologies and mech- 
anisms. The defender-attacker battle is a never-ending game, signifying that fool- 
proof security will never be accomplished in any system and especially in small 
and often unattended networks. Hereinafter, MTD attempts to provide a security 
defense framework with sufficient effectiveness. 
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In this work, a scalable solution that has been tested in a realistic SOHO environ- 
ment and efficiently addresses the aforementioned situation was presented. The IRS 
presented in this work is based on a GNSM generated by the Mul VAL framework 
which is converted to a BAG, to perform risk analysis and form the basis for the 
decision-making process. The decision-making engine implements the POMDP 
model presented in [4] with heavy modifications to better address unknown and 
network-level attacks. Among those modifications is the implementation of an alert 
policy that is able to consider threats throughout all GNSM’s possible states. 

To evaluate the effectiveness of the IRS implementation against realistic situa- 
tions, like a Mirai botnet attack, five attack scenarios (Mirai, Zeus, zero-day, 10 
malicious traffic replays, and BlackEnergy) were executed in a simulated SOHO 
environment. Sixteen IRS configurations were tested, so as to determine the opti- 
mal configuration, test the effectiveness of the aforementioned modifications, and 
to identify its limitations. 

At the end, the system was highly effective against more traditional threats, such 
as Mirai, Zeus, and BlackEnergy, however its effectiveness against novel threats 
(i.e. zero-days), although slightly increased, is somewhat lacking. This work is a 
starting point for future works, as a number of limitations were identified from this 
process, including: a) the inability of IRS’s GNSM to correctly model the state of 
a network with rapid changes to its topology (e.g. by including newly connected 
devices) or to host attributes (e.g. new opened ports); and b) the incorrect matching 
of IDS alerts to the GNSM observed during the zero-day and some of the replay 
scenarios—the cause of many effectiveness penalties during the execution of these 
scenarios. 
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The Internet of Things (IoT) ecosystem is composed largely of heterogeneous 
internet-based devices, which generate an enormous volume of data every day; this 
includes sensors, smart devices, and other industrialised modules. However, the 
complexity of the IoT ecosystem and the quantity of IoT devices available have 
dramatically increased the volume of both emerging and persistent security vulner- 
abilities from edge to cloud computing infrastructure, principally due to security 
problems arising from embedded devices and other legacy hardware. Further, with 
the emerging IoT technologies, malware campaigns and criminal motivations are 
increasingly exploiting these underlying services and existing vulnerabilities. In the 
Cyber-Trust project, we aim to address these security issues to support the growth 
of the IoT ecosystem while mitigating the resulting complexity and vulnerability 
when protecting IoT devices. This chapter presents an overview of the IoT devices 
profiling and threat detection solution proposed by Cyber-Trust to tackle the grand 
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challenges of securing the IoT devices’ ecosystem. In addition, the effectiveness and 
performance of the proposed solution are in-depth verified, especially against bot- 
nets and Zero-day attacks. 


5.1 Introduction: Background and Related Work 


511 Major Cyber Threats to loT 


The growing adoption of Internet of things (IoT) technologies results in a more 
intelligent and connected world. According to the last IoT statistics [1], more than 
10 billion active IoT devices in 2022. Further, it is estimated that by 2025, there 
will be more than 152,200 IoT devices connecting to the Internet per minute. The 
amount of data generated by these devices is expected to reach 73.1 ZB [1]. How- 
ever, connecting this large number of IoT devices globally, most of which are readily 
accessible and easily compromised, allows hackers and malicious actors to use them 
as the cyber-weapon delivery system of choice in many of today’s cyber-attacks, e.g., 
from botnet-building for launching distributed denial of service attacks, to malware 
spreading and spamming [2]. 

On the other hand, IoT devices are essentially resource-constrained in computa- 
tion, battery power, intermittent connectivity, and network protocols. These lim- 
itations hinder the execution of complex security tasks and make them vulnerable 
to a range of attacks such as malware, data leakage, spoofing, disruption of ser- 
vice (DoS/DDoS), energy bleeding, insecure gateways, injections, ransomware and 
device hijacking [3]. Leading to significant security and safety concerns that could 
potentially put human lives at stake [3-5]. 

IoT security has been an increasingly prevalent topic during the last few years, 
especially with the increased security incidents involving smart connected devices. 
In this context, the Open Web Application Security Project (OWASP) IoT project!; 
which is a volunteer community of security professionals, works to investigate the 
most critical IoT vulnerabilities that hackers can exploit as a basis for all kinds 
of malicious behaviour, including distributed DDoS attacks, malware distribution, 
spam campaigns, phishing, fraud, data theft among many others. Furthermore, this 
project intends to help smart device manufacturers, developers, organisations, and 
customers better understand the ongoing IoT security risks and take appropriate 
actions to mitigate them. According to the last report released by the OWASP IoT 
project [6], the most severe IoT threats for 2018 are: 


1. Weak passwords: according to the report, weak, guessable, or hardcoded 
passwords are the Achilles heel of IoT security. If login credentials are not 
changed from their default setting, a simple brute-force attack can be easily 
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used to compromise these devices and use them to launch large-scale attacks 
toward critical cyber-infrastructures. 

Insecure network services: This is another big issue in IoT networks, 
whereby standard network services running on the devices, such as Telnet, 
SSH and insecure HTTP protocols, represent significant security issues that 
manufacturers have not considered. Each open port on a smart device pro- 
vides a new opportunity for malicious actors to gain access to the device [7]. 
Insecure interfaces: Standard interfaces used to communicate with con- 
nected devices are not always secured. This includes web interfaces, cloud 
APIs and mobile interfaces. An insecure interface ecosystem eventually leads 
to the device compromise through vulnerabilities at this level, such as weak 
encryption, data filtering and weak authentication methods. 

Insecure update mechanism: IoT security issues are related to the lack of 
secure update mechanisms, such as missing automatic updates as a feature 
and missing notifications of security changes. Therefore, IoT device man- 
ufacturers should provide periodic security updates/patching to guarantee 
the security of their devices. 

Usage of insecure and outdated components: Some manufactures use off- 
brand devices and insecure software components/ libraries to build cheaper 
IoT devices. However, this practice also brings many vulnerabilities to end- 
users and creates an entry point for potential cyber-attacks. According to 
Symantec [8], supply chain attacks are a massive part of the threat landscape, 
increasing attacks by 78% in 2019. 

Privacy issues: Insecure storage, processing, and disclosure of personal data 
without express consent can lead to many privacy issues and even compro- 
mise the safety of people in the physical world. Moreover, the privacy policy 
statements of some IoT service providers are unclear about the data collec- 
tion and does not identify the system capabilities. 

Insecure data storage and transfer: Usually, data collected by smart devices 
move across a network or retained in a third-party location (e.g., cloud stor- 
age). Thus, the potential for it to be compromised increases, especially with 
the lack of efficient encryption and access control to the device’s sensitive 
data and transfer. 

Lack of devices management: IoT management introduces a host of chal- 
lenges related to security, where most devices connected to a network are 
missing efficient security management, such as a lack of system monitoring 
and update/patching mechanisms, which makes them attractive targets for 
cyber attackers. 

Insecure default setting: Most IoT devices are shipped with an insecure 
default configuration and restricted modifications. However, keeping the 
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default settings such as default passwords will create serious security risks, 
not only to the device, but also to the whole network. 

10. Lack of physical hardening: Physical hardening is one of the most crit- 
ical aspects of IoT security as physical access can be disastrous to devices 
and allows potential attackers to gain sensitive information (e.g., embedded 
passwords), insert malicious code and even rewrite the device’s firmware. 


All these security issues and many others make IoT devices easy targets for hack- 
ers and malicious actors, even using them as means for massive cyber-attacks such as 
Distributed Denial of Service (DDoS) attacks [5]. Thus, there is a crucial need for 
new techniques specially designed for IoT environments to identify and mitigate 
potential IoT-related security attacks that exploit some of these security vulnera- 
bilities. In the following sections, we present a comprehensive review of the latest 
designed techniques for IoT devices profiling and threat detection in IoT. 


5.1.2 loT Threat Detection Methods 


Several studies have attempted to design new intrusion detection systems that can 
identify potential cyber-attacks in IoT networks in recent years. These techniques 
are classified into two main categories: signature-based and behaviour-based detec- 
tion techniques. Signature-based methods are the simplest and most effective tech- 
niques to detect intrusions and cyber-attacks. They refer to datasets of signatures 
(or patterns) of known malware. A signature includes information (e.g., crypto- 
graphic hash) that can identify the malware (attack) [9] uniquely. The current activ- 
ity of the network is compared against the signatures to identify potential attacks. 
If the network traffic signature corresponds to any one of the existing signatures, 
it is considered malicious, and further defence actions are performed [7]. These 
techniques provide 100% accuracy rates in detecting known attacks; however, they 
cannot detect unknown and new attacks (Zero-day attacks) which do not have cor- 
responding signatures [9]. With this limitation, attacks use Obfuscation techniques 
to change the attack signature and avoid detection. 

Anomaly-based detection techniques have been proposed to tackle the limita- 
tions of signature-based detection methods. These methods monitor the network 
activity against a defined set of requirements that refers to a baseline model for the 
expected behaviour of the network. Any deviation from this average profile will be 
considered an anomaly and initiate appropriate defensive actions. Anomaly-based 
detection techniques general start by collecting information that can differentiate 
the expected behaviour of the network from the abnormal one. Then, this informa- 
tion is used to train a machine learning classifier to detect potential attacks [9]. In 
this context, the predictive accuracy of many supervised and unsupervised learning 
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algorithms has been studied in several research works [10-13]. For instance, Verma 
Abhishek eż al. [11] studied the performance of different supervised learning algo- 
rithms in securing IoT devices against DDoS attacks. The studied algorithms are 
Random Forest (RF), AdaBoost (AB), Extreme Gradient Boosting (XGB), Gra- 
dient Boosted Machine (GBM), and Extremely Randomized Trees (ETC). The 
experimental results showed that Multilayer perceptron (MLP) classifier using the 
features selection set derives from the features selection method, outperforms all 
other classifiers with 83% accuracy rate, 90% True Positive (TP) rate and 23% 
False Positive (FP) rate. 

The effectiveness of deep learning algorithms has additionally been investigated 
in many research studies. These techniques give a new powerful paradigm that can 
automatically extract the required features to build the network profile from big 
data without being particularly programmed [14]. For instance, the Recurrent Neu- 
ral Network (RNN) has been used in many research studies to model the network 
activities for intrusion detection in IoT [15], especially their two main variants, 
Long Short-Term Memory (LSTM) [16] and Gated Recurrent Unit (GRU). Fur- 
thermore, Convolutional Neural Network (CNN), which gained great success in 
images classification, has also been used in many intrusion detection methods for 
IoT networks [9, 17]. Results from many studies show that Deep learning can sig- 
nificantly improve the accuracy of intrusion detection. For instance, the proposed 
method in [17] has achieved an average accuracy of 98.9%. Another essential bene- 
fit of these techniques is that they can potentially identify Zero-day and unforeseen 
attacks; however, they have higher false-positive rates. Table 5.1 presents examples 
of the learning algorithms used in intrusion detection methods for IoT and the 
achieved results in terms of accuracy, FP and TP. 


51.3 loT Devices Profiling Methods 


Generally, profiling of loT devices refers to monitoring and recording data that can 
be retrieved from different sources (e.g., IoT devices, network assets) to characterise 
the personal behaviour of IoT devices connected to the network. In this context, 
the abnormal behaviour of IoT devices can be identified by comparing the current 
activities of the devices with an existing profile built from historical activities over a 
set period. If the current behaviour deviates sufficiently from the pre-defined nor- 
mal one, it will be considered as a potential attack and initiates appropriate defen- 
sive actions [19]. Usually, the profiling process could be performed at both the IoT 
devices and the network level (i.e., network profiling) to retrieve information from 
the end-user devices and the network assets (e.g., gateways), respectively. 

Several research works have presented proposals for profiling IoT devices by 
using different techniques such as sensor fusion and SDA with Cloud Services 
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Table 5.1. Populaire learning algorithms used in intrusion detection methods for loT. 


Study Classification Test Dataset Best Results 

V. Abhishek Random forest (RF), e CIDDS-001, MLP 

et al. [1 1] AdaBoost (AB), Extreme e UNSW- o Accuracy: 
Gradient Boosting NB15 83% 
IREE e NSL-KDD © TP: 90%, 
boosted machine o 
(GBM), and Extremely e FP: 23% 
Randomized Trees 
(ETC), Multilayer 
Perceptron (MLP). 

K. K. Sai SVM, Naïve Bayes, Sensor480 with 480 Decision Tree 

etal. [12] Decision Tree, samples e Accuracy: 
Adaboost. 100% 

Z. Marzia Radial Basis Function Kyoto 2006+ RBF 

et al. [13] (RBF), e Precision: 

90% 

R. Bipraneel Recurrent Neural NSL-KDD dataset Accuracy: 89.00% 

et al. [15] Network (RNN) 

K. Jihyun Long Short-Term KDD Cup 1999 Accuracy: 96.93% 

et al. [16] Memory (LSTM) 

G. Mengmeng Feedforward Neural BoT-IoT dataset Accuracy: 96.82% 

et al. [18] Network (FNN) 

V. Huong Convolutional Neural IoT intrusion dataset Accuracy: 98.90% 

et al. [17] Network (CNN) with 357952 samples 


to monitor the device usage and retrieve information about critical files, security 
status, including patching status and firmware integrity [2, 20, 21]. However, in 
this chapter, we focus on network-level profiling techniques. Network profiling 
refers to the process of monitoring and logging all network activity by recording 
information from the packet metadata such as source/destination IP of the packet, 
start time, duration, sensor identity, the used application-layer protocol [2]. IoT 
Network profiling can be performed in six principal areas, with open-source and 
commercial software that provide network operators with the tools necessary to 
understand, control and manage the networks under their control. The six princi- 
pal areas, including examples of applications, are summarised in Table 5.1 [22]. 
As shown in Table 5.2, several open-source and proprietary tools can be used for 
network profiling and investigating potential cyber threats, such as SiLK (System 
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Table 5.2. Principal areas of network profiling with examples of tools [22]. 


Areas Examples of Tools 

Network Spoofing and Redirection DNSMasgq, Ettercap. 

Executable Reverse Engineering Java Decompiler, NET Reflector, IDA Pro, 
Hopper, ILSpy. 

Web App Testing Mitmproxy, Zed Attack Proxy, Burp Suite. 

Active Network Capture and Analysis Canape, Canape Core, Mallory. 

Passive Network Protocol Capture and Wireshark, SiLK, LibPCAP, TCPDump, 

Analysis MS Message Analyser. 

Fuzzing, Packet Execution and American Fuzzy Lop (AFL), Kali Linux, 

Vulnerability Exploitation Frameworks Metasploit, Scapy, Sully. 


for Internet Level Knowledge) 1, a highly scalable and robust toolset for capturing 
and analysing network flow data. In addition, proprietary tools such as NetFlow 
(Sisco), ntopng (ntop) and PRTG Network Monitor offer complete functionality 
for their respective commercial offerings. 

Towards the same direction, the Internet Engineering Task Force (IETF) has 
introduced the Manufacturer Usage Description (MUD) specification for enhanc- 
ing the IoT network security by preventing IoT devices from unrestricted access 
to the network and only allow them to connect to dedicated services [23]. For 
that, MUD requires that loT manufacturers provide a behavioural profile of their 
devices. For instance, an IP camera may need to use DNS and DHCP protocols to 
communicate with a cloud-based controller and an NTP (Network Time Protocol) 
server. This information can be used to generate a device-specific access control list 
(ACL) that set restrictions on this device and, therefore, reduce the potential attack 
surface on the network. However, the MUD specification is still under development 
and so not implemented by manufacturers [23]. 

On the other hand, many research works have proposed different loT network 
traffic profiling approaches [22, 24]. For instance, Jonathan Roux et al. [24] have 
proposed an intrusion detection approach for IoT based on radio communication 
profiling. The proposed solution targets cyber-attacks that may occur through wire- 
less communications by profiling and monitoring the Radio Signal Strength Indi- 
cation (RSSI) related to the wireless transmissions of the connected devices. This 
information is collected by the radio probes placed in the smart area (network). 
Then, a neural network is trained to classify legitimate and illegitimate areas in 
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which devices usually communicate within the smart place. However, the pro- 
posed solution is not fully implemented, and the paper does not provide infor- 
mation about its detection performances (such as accuracy, false positives and false 
negatives.). 

In another work, Andrei Bytes et al. [24] have developed new software for auto- 
matic feature profiling of IoT devices. The device profile is built based on its 
technical capabilities such as device firmware, access mode of the device, network 
operation topology and wireless interfaces. This information is collected from dif- 
ferent locations, including direct and indirect sources. The created profile is then 
used to categorise and compare IoT devices security-sensitive capabilities. 


5.2 Cyber-Trust Detection Method 


The main goal of the Cyber-Trust project is to propose an innovative cyber-threat 
intelligence gathering, detection, and mitigation platform to tackle the grand chal- 
lenges towards securing the ecosystem of IoT devices. The proposed approach 
captures different phases of the loT emerging attacks, before and after known or 
unknown (Zero-day) vulnerabilities. This chapter focus on the detection phase, 
which involves two main components: network profiling and intrusion detection. 


5.2.1 Network Profiling Approach 


The network profiling component, also known as the network repository, automat- 
ically scan connected devices on the locally available network for potential common 
vulnerabilities and currently running services. For each device connected to the net- 
work, the list of potential vulnerabilities is collated from the public dataset CVE 
Mitrel and mapped to the available network services, which are discovered through 
network port scanning tools such as Nmap. This information is then used to cre- 
ate the device profile and other information about the routing information, the 
reported hostname, network flow, and topology. Based on the created profiles for 
each device, the network profiling component computes the out of bound network 
profile behaviour; this is calculated by the continual monitoring of the network 
traffic flow from each device across the network. It utilises rate informed heuristic 
profiling to create an expected throughput pattern for each device on the LAN that 
it is connected to. This profile is then compared against three different predefined 
profiles that refer to the network profile that is obtained by a packet capture that 
is refreshed hourly (HP, Hourly Profile), daily (DP, Daily Profile) and weekly (WP, 
Weekly Profile). The objective of utilising different profiles separated and refreshed 
by period is to provide a more accurate map of the network conditions that a device 
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would experience over time. Increasing profile accuracy and makes the system more 
adaptable to variable network conditions and varied device usage. The Rate Metric 
(RM) for these captures is calculated as follow: 


n 
RM =— (5.1) 


Where n is the total number of bytes transmitted, and t is the time of capture. 
The component can then take periodic network captures of the LAN traffic from 
the gateway, this new capture is then run through the same profiling system as the 
timed profile captures, and a new rate metric is calculated. Finally, a percentage 
difference (A) is calculated, comparing the rate profile of the new capture to each 
timed profile as follows: 


AS — x 100 (5.2) 
PRM 

Delta (A) is the percentage difference between CRM, the calculated rate metric 
and PRM, the profile rate metric. Suppose the delta value passes over a threshold 
value that can be configured per implementation depending on network volatility. 
In that case, the device’s network activity is flagged as out of profile, and a re-scan 
of the network is initiated to re-scan for any possible actively exploited attack sur- 
face on the network. This process is fast but minimal in terms of network impact 
and will not degrade network performance, even on a small network, as the scan 
scale will increase or decrease in intensity automatically depending on scan timings 
and throughput. In addition, this threshold can be raised or lowered depending on 
if scanning is too frequent; the threshold can be increased on a dynamic, variable 
load network, for example. The traffic capture, stored in PCAP format that the 
network profiling component uses to calculate and profile each device, can then 
be transferred to the machine learning component to check the traffic for patterns 
that could indicate malicious traffic, including active attacks or ongoing exploita- 
tion. This profile can then be used to inform mitigation actions across the affected 
network. 


5.2.2 Intrusion Detection Method 


The Cyber-Trust project proposed a hybrid intelligent intrusion detection solution 
for appropriate and effective detection of malicious cyber threats at the host and net- 
work level. The proposed solution combines deep learning and image visualisation 
techniques to detect sophisticated and newly released cyber-attacks in IoT networks 
quickly. Deep learning is a powerful learning technique that has become progres- 
sively dominant in various fields, including intrusion detection. Several researchers 
have suggested the application of image visualisation to intrusion detection systems. 
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In this context, the Intel labs and Microsoft threat intelligence team collaborate on 
a pertinent research project called STAMINA (Static Malware-as-Image Network 
Analysis) [25], which converts binary input files into grayscale images so that a deep 
learning algorithm can process and classify them. This research project’s primary 
approach is to convert the content of an input binary file into a simple stream of 
pixels and convert that into a 2D image that varies depending on aspects like file 
size. Then, a trained neural network classifier is used to analyse and classify the 
output image as legitimate or malware. The learning algorithm is trained on a con- 
siderable amount of real-world data (2.2 million PE file hashes) that Microsoft has 
collected from Windows Defenders installations. STAMINA has proven effective, 
with over 99.00% accuracy in classifying malware and a false positive rate slightly 
under 2.6%. However, it has its limits. For example, it works well with small files, 
but it struggles with larger ones. 

In the Cyber-Trust project, we have proposed an innovative intrusion detection 
solution that converts network traffic into RGB images using the visual represen- 
tation tool Binvis!. Then, the produced images are analysed and classified using 
different learning algorithms, including Residual Neural Network (ResNet50), 
Self-Organizing Incremental Neural Networks (SOINN) and MobileNet. Our 
approach was announced on the first of April 2018, which means two years before 
the announcement of the STAMINA project. The main idea of the proposed solu- 
tion is presented in our research papers [7, 14] and [26]. 

Figure 5.1 shows the produced images from the network traffic by using the 
visualisation tool BinVis. First, the output image is created by assigning specific 
colours to each byte of the PCAP file and converted into a 2D image by using the 
clustering algorithm Hilbert space-filling curve. This conversion is performed on 
each byte depending on its ASCII character reference: 
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Figure 5.1. The Hilbert space-filling curve mapping and (b) the 2D image. 
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e Blue for printable characters 

e Green for control characters 

e Red for extended characters 

e Black for the null character, or 0x00 

e White for the non-breaking space, or OxFF 


5.3 System Implementation and Testing 


5.3.1 Test Bed Setup 


In the smart home domain, the experiments were carried out in the Cyber-Trust 
testbed, which involves a large number (750) of emulated and simulated Small 
Offices/Homes (SOHOs). Each SOHO includes several virtualized devices and a 
separate Ubuntu VM acting as a gateway. As shown in Figure 5.1, the network pro- 
filing component is deployed in the gateway VM because it needs to communicate 
with the smart home network (LAN) and collect information about the connected 
devices. Conceptually, this component may reside on the smart home gateway 
for data collection and communication or given the additional computational 
requirements, it may be relocated on a separate hardware device but closely con- 
nected to the smart gateway. The network traffic can indeed be collected from the 
LAN and WAN interfaces of the smart gateway and subsequently processed for 
storage using NetFlow. The network infrastructure is inferred using a combina- 
tion of discovery mechanisms (Nmap specifically) and querying the services on the 
smart gateway (from ARP and DHCP leases to VLAN and routing information). 

The intrusion detection component that includes the machine-learning detec- 
tion module is deployed in another separate VM running Debian GNU/Linux 
10.2 at the ISP level (WAN network). This component is deployed in a separate 
VM due to the computational power required by the machine learning module. 
For the virtualized devices, different Oss that are used in IoT devices were used 
in VMs or dockerized form. The smart home network configuration is done via 
the gateway VM, assigned two Interface Cards; from here, we control the network 
assignments for both WAN and LAN traffic. The interface card eth0 is referenced 
as NIC1 (172.16.4.1/24) and has Internet connectivity (WAN). In contrast, the 
second interface eth1 is referenced as NIC2 (192.168.1.1/26) and acts as a gateway 
IP for the smart home isolated network (LAN). 


5.3.2 Test Dataset 


To test the proposed detection approach, we have first created an initial dataset for 
training the machine learning module. However, the overall process of the machine 
learning algorithm training is performed incrementally each time new malicious 
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Figure 5.2. Implemented Testbed. 


Table 5.3. Malicious traffic percentage according to type of attack. 


Other 


Malware Type Trojan DDoS Botnets Zero-day Exploits Backdoors Others 


Percentage 33% 16% 19% 8% 6% 18% 


traffic is found, that is, without ignoring the information identified in earlier train- 
ing phases. This incremental learning significantly improved the detection accuracy 
of the machine learning module. This dataset consists of more than 900 BinVis 
images of malicious traffic sourced from multiple malware traffic analysis reposi- 
tories1. Malicious PCAP files contain real malicious traffic generated by Trojans, 
Botnets, Keyloggers spyware and Backdoors, to mention a few. While standard 
PCAP files contain captured regular traffic from the Cyber-Trust project testbed 
from various clean devices in the network using tcpdump. The dataset of malicious 
PCAP files and their corresponding BinVis images is publicly available on the open- 
access IEEE DataPort website2. Table 5.3 shows the percentage of malicious traffic 
samples in the training dataset. 

We have created our collection of PCAP files provided by real malware traffic in 
the Cyber- Trust testbed for the testing dataset. More precisely, malicious PCAP files 
were created from different real-world attack scenarios, including the Mirai Botnet, 
BlackEnergy Botnet, Zeus Botnet, and attack replay scenario, which consisted of 
several attack types Java-RMI Backdoor, distcc exec backdoor, Web Tomcat Exploit 
and Hydra Bruteforce attack. The PCAP files were generated by running live demos 
of each attack scenario and recording inter-device network communication using 


tcpdump. 
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Table 5.4. Metris used in the testing. 


Metrics Description DDoS 
TP+ TN 
Accuracy Refers to the number of correctly A= TP = PIN 
predicted samples out of all the TTR 
samples 
i FP 
False Positive Rate Measures the rate of false alarms FPR = ——— 
i ‘ EP+ TN 
produced by the intrusion 
detection system. 
; EN 
False Negative Rate Measures the rate of non-captured ENR = ——— 
: : i FP + TP 
attacks by the intrusion detection 
system. 
z5 TP 
Precision Measures the percentage of P= TPL EP 
positively classified samples that are * 
truly positive 
TP 
Recall Recall represents the number of R= P+ EN 
normal samples that were correctly + 
classified 
. : Px R 
F-Score F-score is a weighted average F-score = 2 x 
P+R 


between precision and recall 


5.3.3 Testing Results 


5.3.3.1 Machine learning detection module 


Several tests were carried out to evaluate the success of the proposed intrusion detec- 
tion solution and determine its accuracy. The metrics used to investigate the results 
of the ML module are Accuracy (A), False Positive Rate or false alarms and False 
Negative Rate. In these experiments, malicious traffic represents positive instances 
while normal traffic represents negative instances. True Positive (TP) is the number 
of malicious instances that have been correctly classified. False Positive (FP) is the 
number of normal instances that have been incorrectly classified as normal. True 
Negative (TN) is the number of samples of normal traffic that have been correctly 
classified. False Negative (FN) is the number of normal PCAP files that have been 
incorrectly classified as anomalous instances. 

By processing these PCAP replays to the A04 component, we can assess these 
metrics with quantifiable data; the results of this testing resulted in the following 
overall statistics. Figure 5.3 presents the overall results of the tests, which reached an 
overall detection accuracy of 98.35%, which is a high rate and meets the required 
accuracy rate in practical use. By running the tests several times and over 100 runs, 
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Recall (%) MW: SS. J. 99.01% 
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Figure 5.3. Overall testing results. 


the best accuracy (A) result was 98.35%, the false-positive rate was 0.98%, and the 
Negative false rate 0.71%. The precision (P) result was also very high, with a rate of 
99.3%, which shows overall solid confidence in the pattern recognition process. In 
these tests, precision is crucial because getting False Negatives (FN), when malware 
traffic is considered normal, cost more than False Positives (FP), when normal traffic 
is considered malicious traffic. The recall percentage (R) had a result of 99.01%. 
The F1 value (F1) achieved was 99.16%. 


5.3.3.2 Network profiling 


The proposed network profiling approach is used by the Cyber-Trust IoT platform 
to dynamically and actively profile and monitor all network-connected devices to 
detect IoT device tampering attempts and suspicious network transactions. Dur- 
ing the tests performed on the proposed solution, the threshold is set to 80% of 
the percentage difference (PD) in the assigned 60 seconds of capture time. Such 
a significant difference from the standard transmission rate in any capture was a 
good baseline for our use cases. However, it is essential to note that the end-user 
can configure this threshold to match their network use cases if their network activ- 
ity throughput is markedly more volatile or stable than the SOHO networks, we 
tested the configuration. As shown in Table 5.5, during the performed tests, the 
malicious samples were detected as out-of-profile for the devices that have been 
affected were correctly identified as such, yielding a 100% detection success rate for 
the attacks tested. Furthermore, by running the tests several times for both mali- 
cious and benign network traffic, the best accuracy (A) result was 100% and a false 
positive rate of 8.3%. 
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Table 5.5. Results for each kind of attacks. 
Out of Profile Detected 


Malware Type (Yes/No) From Profile A% 

Zero-day exploits Yes D 28.37% 
DDoS attack with Mirai Botnet Yes H, D, W 98.53% 
DDoS attack with Black Energy Yes H, D, W, 128.42% 
java_rmi Yes D, W 96.88% 
distcc_exec_backdoor Yes D, W 98.64% 
Unreallrcd Yes D, W 97.69% 
Tomcat Yes W 395.52% 
ruby_drb_code_exec Yes D, W 682.16% 
hydra_ftp Yes D, W 95.15% 
hydra_ssh Yes D, W 99.14% 
Smtp Yes D, W 93.50% 
netbios_ssn Yes D, W 307.39% 


Zeus malware Yes W 96.70% 


5.4 Conclusion 


In this chapter, we have introduced the Cyber-Trust approach for detecting 
network-level attacks in IoT environments. The approach combines network pro- 
filing, binary visualisation, and machine learning techniques for detecting advanced 
and new threat vectors in IoT networks. Testing the proposed solution is performed 
in the Cyber-Trust testbed, which consists of many simulated and emulated smart 
home networks. For the training and testing of the proposed solution, we have cre- 
ated a new dataset that includes many 2D images corresponding to malicious and 
regular network traffic collected from different sources. In comparison, the mali- 
cious samples used in the testing phase were created in the Cyber- Trust testbed from 
real scenarios of attacks that cover a wide range of critical attacks, including DDoS 
attacks based on Botnets, Zero-day attacks, Malwares, exploits and backdoors. The 
dataset is now publicly available and ca by researchers in this field, especially with 
the lack of libelled data for testing machine learning algorithms. 

The overall testing results are auspicious, especially when considering the results 
of the machine learning component, which recorded an accuracy of 98.35% over 
100 tests with only a 0.98% FPR and 99.31% precision rating. These results were 
acquired from testing against device exploitation from unknown and known com- 
mon vulnerabilities and high impact botnets that have seen extensive infection in 
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the real world; this speaks to the high efficacy of the solution. However, the over- 
all accuracy of the proposed solution still stands to improve its value with further 
training. It is worth noting that when it comes to describing future work, tests 
could be performed to assess whether this model can increase its accuracy with 
more extensive or alternative forms of binary visualisation training and techniques. 
The network profiling has achieved good results, where the attacks were identified 
as out-of-profile for the devices that they have been affected based on the prede- 
fined threshold during the testing. The obtained results for this component could 
be significantly enhanced during the next testing phase by running more samples 
for an extended period. 
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IoT security has now emerged as one of the most important issue in network secu- 
rity. Conventional security techniques, such as firewalls and signature-based intru- 
sion detection systems, have proven ineffective in protecting IoT networks from 
increasingly sophisticated attack and malware. Due to these constraints, researchers 
have been compelled to build novel intrusion detection solutions utilising various 
technologies such as IoT Honeypots and Machine Learning (ML). This chapter 
describes a novel approach to detect malicious network traffic that employs a hon- 
eypot and machine learning. The IoT honeypot system is used to gather intelligence 
about attacks that target IoT devices. The data gathered are used to understand the 
attackers’ weapons, strategies and new techniques utilised. It is also used to train the 
machine learning model used on IDS on a continuous basis to improve its detection 
accuracy. This method is most successful against unknown and zero-day attacks on 
IoT computers. 
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6.1 Introduction 


IoT devices seem to be almost everywhere these days, they are increasingly being 
used in vital infrastructure sectors such as healthcare, security, energy, and emer- 
gency services. All of these devices add a new entry point to networks, raising a 
growing security risk [1]. A single compromised device connected to a network can 
pose a potential threat to the network and serve as a point of entry for a wide range 
of hacking attempts [1]. According to the most recent threat environment, cyber 
criminals’ techniques have advanced to the point where they are extremely difficult 
to identify and remediate. According to a recent report by the University of Mary- 
land [2], they are now successfully breaching IoT devices every 39 seconds. Fur- 
thermore, security incidents confirm that the larger security problem is that these 
devices’ security flaws can be easily exploited by hackers forming vast botnets (i.e., 
zombie armies) and in doing so launch significant DDoS attacks [3]. According to 
A10 Networks’ most recent report, nearly 6 million DDoS attacks occurred in the 
fourth quarter of 2019 [4]. This study confirmed that Mirai remains the malware 
of choice for botnets, and WD-Discovery has surpassed SNMP (Simple Network 
Management Protocol) and SSDP (Simple Service Delivery Protocol) as the third 
most popular source of DDoS [4]. Despite substantial efforts (and budgets) by 
organisations and the security community to defend connected devices, attackers 
continue to devise new strategies to obfuscate their operation and avoid detection 
by cyber defence mechanisms [5]. Current signature-based Intrusion Detection Sys- 
tems (IDSs) are especially ineffective at detecting unknown and obfuscated malware 
for which no signatures exist. Furthermore, malware signatures must be updated 
on a regular basis [6], which requires significant resources and human involve- 
ment/expertise to create these signatures [6, 7]. As a result, innovative intrusion 
detection technologies have become essential for defending against these threats 
before they cause serious harm. 

In this article, we propose a hybrid intrusion detection solution that can enhance 
the currently deployed IDSs systems for protecting IoT networks from intruders, 
obfuscated, and zero-day threats using machine learning and established honey- 
pot technology. The honeypot framework deliberately attracts hackers and uses 
their intrusion attempts to learn more about malicious actors and how they oper- 
ate. Furthermore, raw data generated by the honeypot system is used for effec- 
tive and dynamic training of the machine learning model, increasing its detec- 
tion accuracy. The qualified machine learning model is used to identify possible 
unknown cyber security threats automatically. The remainder of the chapter is 
organised as follows: the first section provides context on honeypots and surveys 
previous work done in this field using machine learning techniques and honeypots 
software. Section 6.3 then offers a description of the proposed intrusion detection 
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system. It also addresses the most suitable strategies and algorithms for successfully 
implementing the proposed system. Finally, the final segment ends the chapter and 
addresses future work. 


6.2 Background and Related Work 


An Intrusion Detection System (IDS) is a security mechanism used to protect 
both the host and the network from potential threats that would normally pass 
through a typical firewall device [8]. IDSs have traditionally been classified into two 
types: host-based intrusion detection systems (HIDSs) and network-based intru- 
sion detection systems (NIDSs) [9]. HIDSs are commonly used to monitor and 
analyse the internal activities on a particular machine as well as the network pack- 
ets on its network interfaces. On the other hand, NIDSs are used to constantly 
track network traffic, searching for potentially malicious and unauthorised inputs 
that could compromise network security and performing automatic precautions to 
reduce them by sending warnings to the network administrator [8, 10]. NIDSs can 
be implemented in two ways: signature-based and anomaly-based. Most security 
defense systems have used a signature-based classification method since the early 
days of threat detection. This form of NIDS tracks network traffic and compares it 
to a database of known threats signatures or attributes, where a pattern that defines 
each particular threat’s unique characteristics is generated, so that specific malware 
can be detected in the future [10]. Signature-based detection techniques are typ- 
ically very successful at detecting known malware, but they are largely ineffective 
at detecting unknown and new malware for which no signatures exist [11]. Due 
to this restriction, modern attackers often mutate their creations in order to main- 
tain malicious functionality by modifying the file's signature, such as polymorphic 
malware, which can create new variants each time it is executed, resulting in a new 
signature [9]. 

Due to the drawbacks of signature-based detection techniques, researchers are 
now concentrating on anomaly-based detection approaches [9, 10]. This technique 
classifies network traffic based on trends generated by tracking the characteristics 
of a typical operation over time. The actual network traffic is then compared to the 
predefined profile, and any major deviation from the pattern is classified as mali- 
cious [9]. This system is particularly effective for detecting unknown and obfus- 
cated threats [10]. With the emergence of new forms of IoT threats on a regular 
basis, many methods and techniques for anomaly-based detection have been pro- 
posed in the literature. Many of these approaches have examined machine learn- 
ing (ML) [12], with a focus on deep learning (DL) algorithms [13], which pro- 
vide a powerful paradigm for automatically determining the features needed for 
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malicious traffic detection [12]. More recent research looked at the use of honeypots 
to improve NIDSs. Honeypot strategies aim to shift the defense strategy against 
attacks by allowing organizations to take the initiative [14]. The parts that follow 
include more information about previous work in malicious network traffic classi- 
fication using the machine-learning methodology, as well as a history on honeypots 
and a survey of honeypot-related work. 

The use of machine learning to defend against intrusions in IoT networks has 
recently gained a lot of attention in academia [15]. Usually, these techniques exam- 
ine usable network traffic information to extract features that can be used to sep- 
arate malicious traffic from legitimate traffic. The features are then used to train 
the classifier to detect possible attacks, with each data instance labelled as stan- 
dard or anomalous. The output results are usually presented in binary format, with 
two possible values: natural or malware traffic [9]. In this area, supervised learn- 
ing algorithms such as nearest neighbour classifiers, support vector machines, and 
rule-based schemes such as decision trees and random forests have shown promis- 
ing results. In [16], a survey proposed a classification of learning-based intrusion 
detection systems and addressed the performance of various supervised and unsu- 
pervised learning algorithms used in this field in terms of accuracy and false alarm 
rate. According to the report, the most significant challenge to supervised learning 
is a lack of accessible datasets with labelled data. According to a study published 
in [17], current intrusion detection technologies for IoT networks still need to be 
improved in terms of scalability, detection accuracy, true positive rate, and energy 
consumption. 

In the same vein, the authors of [18] explored the efficacy of various machine 
learning techniques in protecting IoT devices from DoS attacks. The aim of this 
research is to propose effective methods for developing IDSs for IoT applications 
using ensemble learning. Random forest (RF), AdaBoost (AB), Extreme gradient 
boosting (XGB), Gradient boosted machine (GBM), and Extremely Randomized 
Trees are the classifiers evaluated (ETC). In more recent work [19], authors have 
tested five supervised learning algorithms to distinguish normal IoT packets from 
DoS attack packets. The test classifiers are K-nearest neighbours “KDTree” algo- 
rithm (KN), Support vector machine with linear kernel (LSVM), Decision tree 
using Gini impurity scores (DT), Random Forest using Gini impurity scores (RF) 
and Neural Network (NN) with 4-layer. The accuracy rates of the classifiers ranged 
from approximately 91% to 99%. 

Deep learning has also received a lot of attention in recent years. Because of 
its ability to automatically extract powerful features from unlabelled data, these 
algorithms are recognised as important to intrusion detection in IoT networks. 
The authors of [20] contrasted deep learning approaches to specific conventional 
NIDS techniques. The authors discovered that deep-learning-based approaches 
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outperform convolutional intrusion detection techniques in terms of detection 
accuracy across a wide range of sample sizes and traffic anomaly types. Many other 
solutions, such as work in [21, 22], and [23], have used Recurrent Neural Network 
(RNN) and its variants for network intrusion detection in the same sense. The 
Convolutional Neural Network (CNN), which has achieved great success in image 
classification and pattern recognition, has also been used in many intrusion detec- 
tion systems (IDSs) by analysing images produced by network traffic characteris- 
tics [24, 25]. The output of the CNN-based intrusion detection solution was eval- 
uated in [24] using the synthetic datasets KDDCup 99 [26] and NSL-KDD [27]. 
Auto-encoders and Variational Auto-encoders are two other common deep learning 
techniques that are currently being used in research. Many recent studies [28, 29] 
have looked into the robustness of these strategies in intrusion detection. In terms 
of detection accuracy, the authors of [28] reported that the proposed autoencoder- 
based IDS outperforms IDSs based on Principle Component Analysis (PCA) by 
more than 15%. As a result, several recent approaches have investigated the efficacy 
of using deep learning techniques for intrusion detection. Despite some progress in 
this area, the subject of using deep learning for intrusion detection is underutilised. 

Honeypot technology aims to compensate for weaknesses in intrusion detec- 
tion systems by collecting information about current threats on a network and 
detecting the emergence of new threats [30]. A honeypot is a cyber device that 
impersonates a particular target (e.g., a service, database, or operating system) in 
order to draw cyberattacks and use their intrusion attempts to collect informa- 
tion about intruders and how they work [30]. The intelligence obtained from 
a honeypot would significantly aid in the improvement of the security of real- 
world production systems. Honeypots have historically been rated based on their 
level of contact, which expresses how much activity an attacker may have with 
them [31]. There are two types of honeypots in this context: low-interaction hon- 
eypots and high-interaction honeypots. A honeypot with a high interaction rate 
enables attackers to compromise and gain access to the actual vulnerable service or 
programme [31]. Since they do not emulate any services, High Interaction Honey- 
pots aid in detecting unknown vulnerabilities and gathering detailed information 
regarding an attacker’s procedures. They are, however, more susceptible to infec- 
tion, and as a result, attackers will gain full control of them in order to compromise 
and target other actual production systems on the network [14]. Furthermore, they 
are complex and expensive to deploy and sustain [31]. loTPOT [32] is one of the 
first high-interaction honeypots implemented in the field of IoT to impersonate 
IoT modules. SIPHON [33] is also a scalable, high-interaction honeypot network 
for Internet of Things applications. Honware [34] is another example of a recently 
created high-interaction honeypot capable of simulating a wide range of IoT 
products. 
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Low Interaction Honeypots, on the other hand, operate as emulators of services 
and operating systems, allowing the attacker only minimal interaction. As a conse- 
quence, these Honeypots are not vulnerable and cannot be corrupted by exploits; 
however, attackers can easily detect them by executing commands that the emulator 
does not support [31, 35]. The common tool honeyd [36], which provides a sim- 
ple method to simulate different services provided by several machines on a single 
computer, is an example of a low-interaction honeypot. Low-interaction honeypot 
systems have been used in the field of IoT to capture malicious IoT behaviours. 
Low interaction honeypots such as Nepentes [37] and Dionaea [38] are also used 
for large-scale data collection on self-replicating malware in the wild. To simu- 
late the behaviour of IoT computers, Dionaea honeypot [33] employs the MQTT 
protocol. The developers of [39] used a low interaction honeypot to identify and 
fix vulnerabilities in IoT devices. The honeypot is designed automatically utiliz- 
ing machine learning technology to learn the behavioural characteristics of various 
types of IoT devices. 

MIHs (Medium Interaction Honeypots) are a mixture of low and high interac- 
tion honeypots. Researchers recognise this type of honeypot system as offering a 
full honeypot solution for intrusion monitoring and detection [31]. Several MIH 
IoT honeypot models have thus been proposed in the literature [31, 40, 41], and 
[42]. For example, the authors of [31] proposed a hybrid honeypot architecture 
based on low-interaction honeypots (honeyds) that function as service and operat- 
ing system emulators. Malicious traffic guided to honeyds is then seamlessly routed 
to high interaction honeypots, where attackers can communicate with real services. 
In a subsequent paper [41], the authors defined a hybrid IoT honeypot architecture 
with machine learning for combating zero-day DDoS attacks. In the same vein, the 
authors of [40] developed a new interconnected and collaborative hybrid honeynet 
for IoT networks. The authors of [43] defined an IoT-based honeynet network that 
included both virtual and physical IoT devices. For traffic analysis, the proposed 
honeypot system made use of supervised machine learning algorithms. Examples 
of recently formed IoT honeypots are shown in Table 6.1. 


6.3 Intrusion Detection Framework 


We are primarily interested in detecting and mitigating the unknown malware 
responsible for Zero-Day attacks in this proposed detection system. The word 
“zero-day exploit” refers to malicious code written by malicious actors in order to 
exploit a “zero-day vulnerability.” This form of malware can go unnoticed for sev- 
eral years and is extremely dangerous because only the perpetrator is aware of its 
nature, so no security fixes to address these vulnerabilities and block its subsequent 
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Table 6.1. Examples of recently developed loT honeypots. 


— Dionaea [38]: uses MQTT protocol to simulate the IoT behaviour. 

— U-POT [44]: for devices that use Universal Plug and Play (UPnP) protocol. 

— ZigBee Honeypot [35]: simulates a ZigBee gateway . 

— SIPHON [33]: a high-interaction honeypot platform for IoT devices, with 80 
high-interactive devices. 

— Honware [34]: a high-interaction honeypot framework which can emulate dif- 
ferent IoT devices. 

— Thingpot [45]: Emulates different IoT communication protocol. 

— HIoTPOT [42]: Emulates Telnet services of various IoT devices. 

— Multiport Honeypots [40]: a medium-high interaction IoT honeypots that can 
simulate UPnP services and SOAP service ports. 
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Figure 6.1. Process flow for the proposed solution with the honeypot machine learning 
and based detection framework. 


zero-day exploits are available [9]. We proposed a new detection and mitigation 
approach based on honeypots and machine learning to address this problem. The 
honeypot framework attracts hackers by design in order to track, deflect, and anal- 
yse hacking attempts to gain unauthorised access to IoT devices. In comparison, 
the Machine Learning (ML) based detection system, which is an application of 
machine learning together with binary visualisation techniques, is used to identify 
possible unknown cyber security threats. 

The proposed systems entire mechanism is depicted in Figure 6.1. As shown in 
Figure 6.1, a Honeywall is built in the honeypot system to isolate the honeynet 
network from the production infrastructure of the organization. The Honeywall 
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is also used as the primary point of entry into the honeynet network, providing 
complete control over all incoming and outgoing traffic to and from the network. 
Any actions performed with the honeynet system are deemed malicious and are 
routed to the pre-processing module for further inspection. This data is also con- 
verted to a suitable format (2D image) so that it can be used to continuously train 
the machine learning model and in doing so improve its detection accuracy. Data 
pre-processing also requires analysing the collected data to better understand the 
attackers’ tools, strategies, techniques, and motives. This can be accomplished by 
incorporating resources and frameworks into the honeypots to record all system 
activities. 

The total data collected from deep inspection of network and system level inter- 
actions with the honeynet devices is logged into a central threat actor database, 
this includes the low-level information necessary to generate the aforementioned 
2D image from deep packet inspection of capture network traffic which is then 
used in the training of the NIDS system through an ML-based module. This 
uses these images as the basis of the proposed methods. And provide an author- 
itative summary of the interactions that have occurred, these are assigned to the 
database alongside identifiable information such as the originating IP addresses, 
timestamps, and corresponding service information for the targeted network 
services. 


6.3.1 Honeypot System 


In the proposed solution, the honeypot system is mainly used to gather intelli- 
gence about attack attempts on IoT devices. It involves two main components: 
the honeynet and the Honeywall. The honeynet network is used to attract the 
attackers for intentionally exploiting the vulnerabilities present in IoT devices, 
where all interactions with this network is considered malicious. The data col- 
lection is done at the Honeywall gateway, which is the main point of entry to 
the honeynet network. Once data is captured, it is securely sent to the prepro- 
cessing system for further analysis and for training the classification model. The 
honeynet network consists of different IoT devices that capture different malicious 
behaviours. However, building a honeynet of IoT devices is challenging using tra- 
ditional methods due to the special characteristic of IoT. Thus, many researchers 
have been tried to design new honeypots for IoT devices [34, 40, 43, 44]. As men- 
tioned previously, Table 6.1 provides some examples of recently developed IoT 
honeypots. 

However, the most appropriates implementation of the IoT-based honeynet 
system should simulate the whole IoT platform along with all the supported 
protocols in IoT communications. For example, Thingpot [45] is an IoT virtual 
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honeypot capable of catching various IoT-based botnets by emulating different 
IoT communication protocol along with entire IoT platform behavior. In addi- 
tion, IoT honeypots should be able to provide high-level interactions in order to 
motivate attackers to perform their malicious activities and therefore, keep track of 
a dynamic threat landscape. 


6.3.2 Machine Learning Detection Framework 


The machine learning-based detection framework is a crucial component in the 
proposed solution. As shown in Figure 6.1, this framework consists of two main 
steps, first obtaining the corresponding visual representation of the collected net- 
work traffic, and second, processing this visual representation by the trained 
machine learning model. The main idea of this framework is based on the Malware- 
Squid approach proposed in [46], which represent the cyber-defence service in 
the Cyber-Trust project [47]. In this approach, we use the Hilbert space-filling 
curve [48] as its main clustering algorithm, this is achieved by assigning spe- 
cific colours to each byte as it’s converted into a 2D image. This clustering algo- 
rithm outperforms other curves in preserving the locality between objects in multi- 
dimensional spaces, which helps to create much more appropriate RGB images for 
the classification process [11]. The conversion is performed on each byte depending 
on its ASCII character reference as follow: 


e Blue for printable characters 

e Green for control characters 

e Red for extended characters 

e Black for the null character, or 0x00 

e White for the non-breaking space, or OxFF 


These generated byte arrays are then processed using the Hilbert algorithm, 
transforming them into images that retain optimal locality for pattern recognition, 
allowing them to be processed by the machine learning image classification models. 
The size of the output RGB image is 784 (1024*256) bytes. Figure 6.2 shows Bin- 
Vis images for both normal and malware network traffic, which are created using 
the Hilbert space-filling curve. 

There is a number of learning algorithms available for performing network traf- 
fic classification based on the generated 2D images. However, in this work, we are 
interested in unsupervised learning algorithms that can accurately classify the net- 
work traffic as “normal” or “malicious” with a reasonable rate of false alarms. In this 
context, a variety of unsupervised classifiers such as Autoencoders, Self-Organizing 
Incremental Neural Network [49], Residual Neural Network (ResNet) [11] and 
MobileNet [46, 50] have been found to be effective in detecting abnormal network 
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Figure 6.2. Binvis images of both normal and malware network traffic created with the 
Hilbert space-filling curve [11]. 


traffic with an overall accuracy value that meets the required values in practical use 


(from 94% to 96%). 


6.4 Conclusion 


In this chapter, we introduced a new approach for network intrusion detection 
based on machine learning and honeypot technology. For the implementation of 
the proposed intrusion detection framework, we have discussed already developed 
technologies in the fields of IoT honeypots and machine learning. The use of IoT 
honeypots that can simulate a whole IoT platform will ensure the logging of a 
large vector of IoT based security threats characteristics, especially, new threat vec- 
tors. Collected malware traffic can be also used to effectively train the ML-based 
detection system, which will undoubtedly enhance its detection accuracy and there- 
fore, protect the whole production network against the new immerging security 
threats. 

For the future scope, we will implement the proposed IDS framework in a real- 
world environment and deeply investigate open issues related to loT honeypots over 
real-time scenarios. We also intend to compare the performance of the proposed 
solution in contrast to representative models in this field. 
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Two of the most significant arising technological advancements currently underway 
that are showing an ever-increasing spread both in industrial and academic areas, 
are the blockchains and the advent of quantum computing. Since, blockchains have 
dramatically advanced in the recent years and have found numerous applications in 
many fields with the expectation to significantly enhance their security, the conun- 
drum related to the quantum threat and the implementation of post-quantum sig- 
natures in blockchains is a trending topic in nowadays scientific community. As any 
product that is based on cryptographic primitives, this technology is influenced by 
the advent of quantum computing, since they are not essentially different from 
other resilient and secure applications in such regard. This chapter provides the 
theoretical support of the recent developments in the area of post-quantum cryp- 
tography (PQC) aiming at the incorporation of secure cryptographic primitives 
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to the blockchain technology. For this reason, the chapter assesses contemporary 
PQC algorithms and presents the current situation of the NIST’s 3 round PQC 
candidates. In addition, it demonstrates the impact of quantum-computing on 
blockchains and it investigates the incorporation of PQC primitives to the vari- 
ous blockchain platforms. Therefore, this chapter aims to provide guidelines and 
demonstrate the challenges to both researchers and industry regarding the imple- 
mentation of post-quantum algorithms in blockchain applications. 


7.1 Introduction 


Since the evolution of Bitcoin, the blockchain technology has met growing inter- 
est in the last years as a novel technology facilitating the degree of decentralisa- 
tion required by modern applications and services in an efficient and robust way. 
Blockchain is a distributed database of records, or shared ledger of all the trans- 
actions or digital events having been executed and exchanged among a number of 
parties. Blockchains have already adopted the basic cryptographic primitives, such 
as the hash functions and the digital signatures, which are used to achieve consensus 
and authenticate transactions. Most of the most popular blockchain platforms use 
a linked list of blocks, in which each block pertains a hash pointer of the previous, 
while the data of each block is organized using Merkle trees. However, such schemes 
and algorithms cannot guarantee the security requirements that might occur in the 
future. While, the modern computer society tends to globalization, the goals for 
security are not only basic requirements, such as tamper resistance and trust, but 
also compelling security demands for privacy preservation mechanisms and needs 
for enforcing accountability in many applications [1]. Since, the blockchain tech- 
nology has been adopted not only to the financial industry, but to many other 
areas as well [2—4]; its security and business architecture cannot be easily modified. 
Therefore, the security of blockchains should acknowledge not only the ongoing 
means of attacks, but also security issues that might surface in the future. 
Essentially, for the transaction’s authentication, the blockchains are based on the 
elliptic curve digital signature algorithm (ECDSA), which is not adequate enough 
to deal with the quantum threat. The Shor algorithm has been proven to demon- 
strate quantum supremacy over classical computing. If this algorithm is used by 
an attacker, then the victim's private key can be derived from the public key and 
the system’s security to be compromised. Similarly, if the attacker forges the user’s 
signature, then all the user’s assets and privacy will be lost. Therefore, consider- 
ing the cryptographic underpinnings of blockchains, this chapter underlines the 
post-quantum security aspects that can be adopted in blockchain technology and 
enable it to resist quantum attacks based on the Shor’s and Grover’s algorithms. 
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Mote precisely, this chapter presents the impact of quantum-computing attacks on 
blockchains and it investigates the incorporation of PQC primitives in the various 
blockchain platforms. Particularly, the most appropriate post-quantum cryptosys- 
tems for blockchains are examined along with their main challenges. Therefore, this 
chapter can be used as a guide for the development of post-quantum blockchains, 
since it is necessary that both researchers and industry to be aware to the quantum 
computing area and its advances. 

The chapter consists of six sections, including the current introductory section. 
More precisely, the structure of the document is as follows: Section 7.2 describes 
the state-of-the-art in post-quantum cryptography (PQOC), in which the public key 
PQC cryptosystems, the PQC signing algorithms and the the current situation 
of NIST are presented. Section 7.3 deals with the advances of the PQC in the 
blockchain technology and presents the blockchain platforms that support PQC 
primitives. Section 7.4 performs a comparison of the performance of PQC prim- 
itives that passed to the third round of the NIST call and describes the resistance 
of PQC algorithms on various cryptographic attacks. Finally, the main conclusions 
obtained are summarized in Section 7.5. 


7.2 State-of-the-Art in PQC 


7.21 Public-Key Post-Quantum Cryptosystems 


Post-quantum cryptography (PQC) refers to cryptographic systems that will pro- 
vide security even in case that quantum computers become a reality. More precisely, 
quantum computing makes use of quantum-mechanical phenomena, thus being 
more powerful than classical computers. In simple words, classical computers oper- 
ate on bits, which can have one of two values (states), i.e. 0 or 1, whereas quantum 
computers operate on qubits, which are in a superposition of states, i.e. 0, 1, or (a 
little bit of) both. Due to this, quantum algorithms can leverage this superposition 
of states to provide efficient solutions to several mathematical problems in which 
classical computers practically fail to provide a solution. Although not every prob- 
lem can be efficiently solved; there exist though several problems which are being 
considered difficult today, but they are efficiently solvable by a quantum computer. 
Some of these problems constitute building blocks for contemporary cryptographic 
algorithms, thus rendering them fully insecure in the post quantum era. 

The most famous quantum algorithms, which have direct impact on the security 
of cryptographic systems, are the Shor’s integer factorisation algorithm, which is a 
quantum algorithm that factors an integer N in polynomial time with respect to 
the length of N and the Grover’s algorithm, which is a quantum algorithm for 
searching an unstructured database. 
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Current symmetric ciphers with 256-bit keys such as AES-256, are believed to 
be quantum-resistant. Similarly, hash functions with proper parameters (i.e., length 
of the hashed value) are also considered post-quantum secure, in terms of collision 
resistance. Therefore, post-quantum cryptography research focuses on asymmetric 
algorithms, so as to replace RSA, (EC)DH and (EC)DSA. These post-quantum 
secure algorithms are based on mathematical problems that are believed to be dif- 
ficult in the classical and quantum cases. Moreover, since hash functions are also 
post-quantum secure, several post-quantum digital signature schemes whose secu- 
rity relie on the security of hash functions also exist. 

More precisely, the post-quantum cryptographic algorithms are mainly classified 
into one of the following categories, whilst each of them rests its security with one 
specific difficult mathematical problem: 


© Code-based cryptography, 

e Lattice-based cryptography, 

e Multivariate cryptography, 

e Hash-based cryptography, 

e Supersingular elliptic curve isogeny cryptography. 


whereas hybrid approaches are also considered. In addition, a few algorithms are 
based on the security of zero-knowledge proofs, which are described next. 


Code-based cryptography 


The security of the cryptographic algorithms included in this class is based on 
coding theory — i.e., with the inherently different problem of decoding an erro- 
neous codeword which has been produced through an unknown error correcting 
code. The most classical such system is the McEliece’s cryptosystem, whose secu- 
rity is based on the syndrome decoding problem. McEliece’s cryptosystem provides 
fast encryption and relatively fast decryption, which is an advantage for perform- 
ing rapid blockchain transactions. However, McEliece’s cryptosystem requires large 
matrices that act as public and private keys, which may be a restriction in con- 
strained environments. 


Lattice-based cryptography 


This class includes cryptographic algorithms whose construction is based on lat- 
tices, which are sets of points in n-dimensional spaces with a periodic structure. 
These algorithms rest their security on the known difficulty of specific mathemat- 
ical problems in the field of lattices, like the Shortest Vector Problem (SVP), being 
NP-hard, which is related with the finding of the shortest non-zero vector within a 
lattice. Other similar lattice-based difficult problems also exist, such as the Closest 
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Vector Problem (CVP), the Shortest Integer Solution (SIS) or the Shortest Inde- 
pendent Vectors Problem (SIVP). An important lattice-based problem, which is 
being “present” in several lattice-based cryptographic system, is the “learning with 
errors” (LWE) problem, which has security reductions to variants of SVP. 


Multivariate cryptography 


Multivariate cryptography relies on the complexity of solving systems of multi- 
variate equations, which have been demonstrated to be either NP-hard or NP- 
complete. In general, it is known that such cryptographic schemes have some lim- 
itations into their decryption speeds (due to the involved “guess work”. Currently, 
some of the most promising multivariate-based schemes are based on Hidden Field 
Equations (HFE) for a generic survey of mathematical problems in the field of 
multivariate cryptography. 


Hash-based cryptography 


This scheme includes cryptographic digital signatures schemes whose security relies 
on the security of the underlying hash function instead of on the hardness of a 
mathematical problem. This kind of schemes was initiated since the late 70s, when 
Lamport proposed a signature scheme based on a one-way function. 


Supersingular elliptic curve isogeny cryptography 

This scheme includes cryptographic algorithms whose security relies on the isogeny 
protocol for ordinary elliptic curves but enhanced to withstand the quantum attack. 
Such cryptosystems usually employ key sizes in the order of a few thousand bits. 
Other approaches 


Post-quantum cryptography based on zero-knowledge proofs: Based on the classical 


concept of zero-knowledge proofs, these cryptographic algorithms are generaliza- 
tions of hash-based cryptographic schemes, enriched by nice cryptographic prop- 
erties of symmetric ciphers towards constructing zero-knowledge proofs. 


Hybrid approaches: The hybrid schemes seem to be the immediate next step towards 


post-quantum security, since they appropriately merge pre-quantum and post- 
quantum cryptosystems, aiming to protect the exchanged data both from quan- 
tum attacks and from attacks against the used post-quantum schemes. However, 
such schemes involve implementing two complex cryptosystems, which require 
significant computational resources and more energy consumption. Therefore, 
future developers of hybrid post-quantum cryptosystems for blockchains will have 
to look for a trade-off between security, computational complexity and resource 
consumption. 
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7.2.2 Post-Quantum Signing Algorithms 


In real-world applications today, the most widely used cryptographic schemes for 
digital signatures are RSA, Digital Signature Algorithm (DSA), and Elliptic Curve 
Digital Signature Algorithm (ECDSA. However, as it is already mentioned, such 
digital signature schemes are not post-quantum secure. Therefore, it is essential, for 
blockchain applications to provide a long-term security and ensure that the digital 
signatures are secure against post-quantum computers. To this end, we subsequently 
focus explicitly on post-quantum signing algorithms. 


Hash-based digital signatures 


The hash-based signature (HBS) algorithms are schemes with minimal security 
requirements, reasonably fast, providing small size signatures and having strong 
security guarantees (their security proofs are relative to plausible properties of the 
cryptographic hash functions). 

HBS schemes can be classified as stateless and stateful schemes which can be fur- 
ther categorized as One-Time Signature (OTS), Few-Time Signature (FTS), Multi- 
Time Signature (MTS), and Hierarchical Signature (HS), depending on key and 
signature generation. A nice taxonomy of these schemes can be seen in Figure 7.1. 


Stateful one-time signature (OTS) schemes: The Lamport scheme, the Winternitz 


scheme, and its variants WOTS+, WOTS?®® are characteristic algorithms lying 
in in this class. To sign a message with OTS schemes, the private key is uniformly 
generated at random, whereas the public key is derived by the private key, by appro- 
priately involving a hash function; the irreversibility of the hash function, as well 
its collision resistance, ensure that knowledge of the public key does not allow the 
computation of the private key. The Lamport scheme, even if it possesses great 
security properties, it is actually practically inappropriate due to several limitations; 
first is the one-time signature scheme (i.e., each signature can be used only once), 
whereas it requires extremely large sizes of keys; the derived signatures are also 
large (see Table 7.1). The fact that it is an OTS scheme implies that each secret 
key is being used only once for signing; otherwise, an attacker may be capable to 
derive useful information for imitating the user via setting valid signatures (since 
the attacker will be able to learn part of the secret key). The drawbacks that are 
related with the efficiency of the Lamport scheme are being alleviated by the Win- 
ternitz One Time signature (WOTS) scheme, which utilizes a so-called Winternitz 
parameter that controls a time/memory trade-off. Therefore, in principle, reduc- 
ing the space required for keys and signatures makes WOTS a good choice for 
memory-constrained embedded devices, but at the cost of slower signing and veri- 


fying process. 
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Figure 7.1. A taxonomy of HBS cryptographic scheme [9]. 


Table 7.1. OTS and FTS schemes for 384-bit message length and about 128-bit post- 


quantum security level. 


Signature Scheme Type Signature Size (Kb) Key Size (Kb) 


Lamport OTS 18.4 36.9 
WOTS OTS 4.8 4.8 
WOTS+ OTS 3.2 3.2 
WOTSPPE OTS 3.2 3.7 
HORS-T FTS 17.3 0.05 


Stateful Multi-time Signature Schemes (MTS): To tackle with the inherent limita- 
tions of OTS schemes, MTS schemes are proposed to construct many-time sig- 
natures by using OTS as an underlying primitive. The first such scheme has been 
proposed by Merkle, being called Merkle Signature Scheme (MSS) [5]. This scheme 


utilizes a so-called Merkle tree, which suffices to combine a large number of OTS 
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public 


private 


Figure 7.2. A Merkle tree with a verification path for the OTS public key h1,0 [5]. 


key pairs into a single binary hash tree structure (as shown in Figure 7.2). The 
root of the tree constitutes a global public key. Due to the properties of the under- 
lying hash functions that are being used to build a Merkle tree, the signer (and 
nobody else) can easily prove that an one-time public key (e.g. a WOTS+ public 
key) is associated with a global public key, by revealing appropriate nodes of the 
tree, determining the authentication path, which allow the validator to reconstruct 
the path from the relevant one-time public key to the tree’s root upon signature 
verification. 

Moreover, there are several other efficient ways to handle Merkle trees, espe- 
cially the authentication (i.e. appropriately caching the authentication path from 
the previous signature). Such clever techniques give rise to more efficient signa- 
ture schemes based on Merkle trees — with the Extended Merkle Signature Scheme 
(XMSS) being a prominent example [6]. The XMSS scheme is an appropriately 
modified Merkle hypertree, where the inherent leaves of the tree are based on a 
WOTS-+ scheme. More precisely, the XMSS scheme utilizes a Merkle tree with a 
major difference being the use of bitmask XOR of the child nodes prior to con- 
catenation of the hashes into the parent node. The use of the bitmask XOR allows 
the collision resistant hash function family to be replaced. Each leaf of the tree is 
the root of child trees (also XMSS trees) being called L-trees, which hold the OTS 
public keys. 


Stateful Hierarchical Signature Schemes (HS): Stateless hash-based signature 
schemes are generally considered slow, since it is necessary to construct a new tree to 


generate a new key pair. Therefore, hierarchical signature schemes (HS) constitute 
the next step towards improving efficiency. HS schemes are actually MTS schemes 
that use other hash-based signatures in its construction. The idea of HS is based on 
the formation of a hyper-tree that involves tree chaining by using multiple layers of 
MSS tree. By these means, the upper layers are used to sign the roots of the layers 
below while only the lowest layer is used to sign messages. Notable examples of HS 
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Figure 7.3. XMMSM! with 4 layer [42]. 


are XMSS-MultiTree (XMSS™") (see also Figure 7.3), XMSS with tightened secu- 
rity (XMSS-T) and Leighton Micali Scheme (LMS). A XMSS™" is a nice option 
for applications that require many messages to be signed, provided that the tech- 
niques mentioned above for optimization (use of PNRG, caching of authentication 
path etc.) are still present. 

Another, more recent, stateful HBS scheme, which utilizes a blockchain for stor- 
ing “authentication paths” is the so-called BPQS scheme [7]. BPQS is actually a 
modified XMSS scheme, using a single authentication path (i.e. a chain and not a 
tree). The researchers in [7] suggest thar BPQS fits well with blockchains. 
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Figure 7.4. Hypertree structure used in SPHINCS [9]. 


Stateless Hierarchical Signature Schemes (HS): The mail property of stateful hierar- 


chical signature schemes is that the signing process requires the renewal of the secret 
key. In other words, for stateful signature schemes, signing requires keeping state 
of the used one-time keys and making sure they are never reused. However, there 
are also stateless hierarchical signature schemes, with the most prominent example 
being the SPHINCS [8] and its variants SPHINCS-Simpira, Gravity-SPHINCS 
and SPHINCS-+. Similar to XMSSMT, SPHINCS uses a hypertree such that the 
upper layers use XMSS with WOTS-+ to sign roots of their ancestors, while the 
lowest layer uses a Merkle tree construction with HORS-T for signing messages (as 
shown in Figure 7.4). Since the stateless schemes do not keep a record of used key 
pairs, hence to ensure the correct few-time usage of key pairs, SPHINCS deploys 
multiple HORS-T key pairs and selects a random one for each signature genera- 
tion (HORS-T are few times — instead of one time — signature primitives (FTS)). 
Hence, no path-state tracking is required. 

In stateless schemes such as the SPHINCS, generating all private (HORS-T and 
WOTS-+) keys with a PRNG and computing one tree in each layer for signature 
generation results in an efficient computation. Nevertheless, stateless schemes pose 
the following performance issues. First, the signature generation is more expensive 
because the key pairs are used in random order rather than successive order; hence, 
several optimization algorithms that are being used in stateful schemes are not appli- 
cable. Moreover, in contrast to WOTS+, HORS-T signatures are relatively much 
larger [9]. Note that Table 7.1 also provides relevant information on HORS-T, as 
an FTS primitive, compared to OTS primitives. A summary between the discussed 
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Table 7.2. Comparison between stateful and stateless signature schemes in [9]. 


Signature Base Key Re-use Signature Key 
Scheme Instantiation Scheme Capability Size (Kb) Size (Kb) 
MSS SHA-384 WOTS 260 7:7 0.05 
XMSS SHA-256 WOTSPPF 299 4.7 0.03 
XMSSMT AES-128 WOTSPRE 280 10.7 Private key = 26.1 

Public key = 1.8 
SPHINCS SHA-256  HORS-T & Unlimited 41.0 1.0 

WOTS+ 


stateless (SPHINCS) and stateful (MSS, XMSS, XMSS™") HBS schemes is given 
in Table 7.2, whereas an overall evaluation, is given in Table 7.3. 

Even though post-quantum security is considered to be present in HBS schemes, 
all the potential attack surface should be also examined, mainly stemming from 
implementation attacks — i.e., side channel attacks and fault attacks. In a side- 
channel attack, the attacker gains extra critical information (i.e., relative to a secret 
key) by monitoring and/or measuring quantities such as power consumption, elec- 
tromagnetic leaks, timing for performing an execution etc. In a fault attack, a fault, 
which can be either natural or malicious, is misbehavior of a device that causes the 
computation to deviate from its specification, which could also yield some infor- 
mation on the secret key. HBS schemes are vulnerable to hardware fault attacks 
both in the presence of natural and malicious faults, so special attention should be 
given on appropriately implementing such schemes. Moreover, another problem in 
the stateful signature schemes is the so-called cloning. Such a threat occurs when- 
ever a private key is copied and then used without coordination with execution 
units (known as non-volatile cloning) or without coordination with storage units, 
known as volatile cloning. 

Some researchers consider XMSS and SPHINCS to be impractical for 
blockchain applications due to their performance (relatively slow signing speed, 
whereas the size of the signature in SPHINCS is 41kb), so alternatives have been 
suggested. 


Code-based digital signatures 


Several post-quantum code-based signing algorithms have been proposed; proba- 
bly the most known are the schemes from Niederreiter and CFS (Courtois, Fini- 
asz, Sendrier), which are similar to the McEliece’s cryptosystem. The signatures of 
such schemes are short in length and can be verified really fast, but similarly to 
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Table 7.3. An overall generic evaluation of stateful and stateless HBS schemes [9]. 


Type Pros Cons Use Case 
Stateful — Shorter signature — State synchronization Performance- 
size problem synchroniza- constrained 
. . . nvironmen 
— Faster signature tion failure pene 
generation time — Face cloning problem 
Stateless — No state synchro- — Longer signature size Resource- 
nization. problem — Slower signature gen- constrained 
— No cloning prob- eration time eee 
lem 


the McEliece’s cryptosystems, the use of large key sizes requires significant compu- 
tational resources and, as a consequence, signature generation may become ineffi- 
cient [10]. 


Multivariate digital signature schemes 


This class of post-quantum signatures typically yields large public keys, but very 
small signatures. Some of the most popular multivariate-based schemes rely on 
Matsumoto-Imai’s algorithm or on variants of HFE, which can generate signatures 
with a size comparable to the currently used RSA or ECC-based signatures. Other 
relevant multivariate-based digital signature schemes have been proposed, like the 
Rainbow. In general, it is widely assumed that such cryptosystems need to be further 
improved in terms of key size. 


Lattice-based digital signature schemes 


Among the several lattice-based signature schemes described in the literature, the 
ones based on Short Integer Solution (SIS) seem to be promising due to their 
reduced key size. For several years, it was assumed that BLISS-B (Bimodal Lat- 
tice Signatures B), whose security rests with the hardness of the SIS problem, could 
be a very nice option due to its good performance. However, it is found out that 
BLISS is vulnerable to side-channel attacks [10]. Besides BLISS, there are in the lit- 
erature other lattice-based signature schemes that rely on the SIS problem but that 
were devised specifically for blockchains [11]. Moreover, lattice-based blind signa- 
ture schemes have been used to provide anonymity and untraceability in distributed 
blockchain-based applications for the IoT. 
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Isogenies digital signature schemes 


Although supersingular elliptic curve isogenies can be used for creating post- 
quantum digital signature schemes, there are not many such schemes known, 
whereas they also are not efficient. Some schemes of this class indicate though that 
“it is necessary to address key size issues when implementing isogeny-based cryp- 
tosystems and Supersingular Isogeny Diffie-Hellman (SIDH), especially in the case 
of resource constrained devices”. 


Zero-knowledge proofs for digital signatures 


There is one important post-quantum digital signature scheme, called Picnic, which 
has a significantly different design principle compared to all the previous. Picnic, 
which is submitted to the NIST competition, is based on non-interactive zero- 
knowledge proofs, where the proof of knowledge is instantiated using the MPC- 
in-the-head approach. The signature is a proof of knowledge of a secret key for a 
block cipher that encrypts a public plaintext block to a public ciphertext block, 
which together form the public key of the signature scheme. All the cryptographic 
building blocks can be instantiated using symmetric-key primitives (block ciphers 
and hash functions), whereas the MPC (Multi-Party Computation) protocol can 
be instantiated with information-theoretic security. 


7.3 Blockchain and Post Quantum Cryptography 


To tackle the quantum threat in the blockchain technology, several researchers have 
proposed post-quantum-enabled blockchain solutions or even some adjustments 
to popular distributed leaders. Commercial blockchains have also analyzed and 
addressed the impact of quantum computers. These include the Quantum Resis- 
tant Ledger (QRL) which uses XMSS, the IOTA which uses WOTS and Corda 
which uses BPQS. 


7.31 Bitcoin 


The platform Bitcoin uses the ECDSA with the Koblitz curve secp256k1 algorithm 
and the hash function SHA-256 to authorize the transferring of coins and assets. 
Defined by the Standards for Efficient Cryptography Group (SECG), the Koblitz 
curve provides several advantages, such as efficiency, reduction of the key size and 
security, but the main drawback is its weakness against the quantum attack. There- 
fore, to secure the digital signatures that are included in Bitcoin transactions against 


the Shor’s algorithm, the authors in [13], implemented a signature scheme based 
on the TESLA# algorithm, which uses the BLAKE2 and the SHA-3 functions, 
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hence yielding a fast signing and verifying signing scheme. However, qTESLA is 
not present in the third round of evaluation in the NIST competition. 

The research of lattice-based cryptography, which lays the foundation for the 
design of anti-quantum attack signature scheme, is not only fruitful to resist the 
quantum threat, butit is also suitable for blockchains. Therefore, the authors in [14] 
proposed a transparent e-voting blockchain system, which could be applied in Bit- 
coin. In this scheme the voters that operate maliciously are audited, while code- 
based cryptography is used to resist quantum threats. More precisely, a certificate- 
less traceable ring signature algorithm is introduced in the proposed blockchain- 
enabled e-voting system to solve the problem of verifying public key certificates 
and the Niederreiter’s code-based cryptosystem is adopted to address the quantum 
threat in the e-voting protocol. 


7.3.2 Ethereum 


The authors in [15] proposed a framework that encrypts and sensitive industrial 
data, while the uploader decides with whom this data can be shared with. The 
architecture is modeled to operate with the popular Ethereum platform and the 
Inter Planetary File System (IPFS). However, similar and traditional platforms are 
also able to provide the necessary requirements for the framework’s operation. The 
framework uses the Elliptical-Curve Diffie-Hellman Key Exchange (ECDH) and 
the SIDH algorithms. Thus, the advantages and drawbacks of each algorithm is dis- 
cussed in that paper, concluding that SIDH is the most suitable approach because 
it is post-quantum secure and it ensures security against attackers with quantum 
computing capabilities. The Ethereum platform is also modified in [16], in which 
paper, the authors applied a multivariate-based cryptosystem (the Rainbow signa- 
ture scheme) and compared its efficiency with the current version of Ethereum, 
which is based on the ECDSA. 


7.35.5 IOTA 


IOTA is a popular distributed ledger designed for the IoT ecosystem. The platform 
is considered as a quantum resistant, rather than as a quantum-proof ledger. In 
particularly, is does not use conventional public key cryptography, but the IOTA 
Signature Scheme (ISS) that is based on WOTS. In this platform, the users in 
IOTA sign the message’s hash, which means that the security of ISS is based on 
the cryptographic strength of the hash function. Therefore, IOTA transactions are 
quantum resistant, but require a new private/public key to be generated each time 
that a transaction is being signed with the private key, because a part of the private 
key is revealed in the signature process. 
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7.5.4 QRL 


While designing the QRL, great emphasis has been given to the cryptographic secu- 
rity of its signature scheme, in order to be secure against both classical and quantum 
attacks, not only at the present day, but also in the future decades. QRL replaces 
secp256k1 with XMSS, using the hash function SHA-256 and offers 196-bit secu- 
rity with expected security against the brute force attack until the year of 2164. The 
asymmetrical hypertree signature scheme that is being used in QRL is consisted by 
chained XMSS trees and provides the dual advantage of using a validated signa- 
ture scheme and the permission of generating ledger addresses with the capability 
of signing transactions without a pre-computation delay that is observed in XMSS 
constructions. 


7.3.5 Corda 


Corda typically supports conventional public key signature algorithms, such as 
ECDSA and RSA (the default signature is ECDSA with NIST P-256 curve — i.e., 
secp256p1). However, at an experimental level, SPHINCS has been employed 
towards providing post-quantum security. Moreover, very recently, researchers from 
R3 (i.e. the company supporting Corda) proposed the aforementioned BPQS 
signature scheme, forming an improvement of the XMSS (and, actually, the 
blockchain by itself plays such a role, thus comprising a blockchained signature 
scheme). 


7.5.6 Hyperledger Fabric 


The Hyperledger Fabric does not provide (by default) post-quantum security. How- 
ever, it has been announced that achieving post-quantum security is one of the 
priorities with respect to further advancements of the ledger. To this end, such an 
approach has been very recently suggested in a research paper [17]. The researchers 
present the so-called PQFabric, which is the first version of the Hyperledger Fabric 
enterprise permissioned blockchain whose signatures are secure against both classi- 
cal and quantum computing threats. In this paper, the researchers implement and 
analyze hybrid signatures that are configurable with any post-quantum signature 
algorithm. 

The authors redesign the credential-management procedures and specifications 
of the Fabric network and they created hybrid signatures that are a combination of 
the classical and quantum-safe digital signatures. The comparative benchmarks of 
PQ-Fabric are performed with some of the NIST candidates and alternates, namely 
Falcon-512, Falcon-1024, Dilithium-2, Dilithium-3, Dilithium-4 and qTesla-p-I. 
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The proposed system is built on-top of Fabric v.1.4 and the LIBOQS v0.4, which 
is used for the implementation of the post-quantum cryptographic algorithms. 

The integration presented in [17], was not straightforward, and therefore three 
core modules of the Fabric’s codebase were modified to allow the incorporation 
of hybrid quantum signatures, (1) the Blockchain Cryptographic Service Provider 
(BCCSP) that offers the implementation of a uniform interface. This interface calls 
the relevant signature scheme based on the key type that is being used; (2) the local 
Membership Service Provider (MSP) that extracts the cryptographic keys, both 
public and private — since the hybrid quantum-classical cryptography needs two 
keys — from the X.509 certificate; and (3) the cryptogen, which is a template used 
to create the cryptographic material needed to run the Fabric platform from its con- 
figuration files. Therefore, the modified MSP obtains the private and public keys 
from the X.509 certificate, stores them for each node in an internal structure and 
then provides them to the BCCSP module every time that a message is signed. The 
signature scheme simple allows the LibOQS to re-hash the already hashed message, 
but this action has a cost for the platform’s performance. Particularly, the speed of 
the signature algorithm is the key factor that impacts the performance of schemes 
with larger signature sizes and keys. 


7.4 Performance and Resistance of Potential Blockchain 
Post-Quantum Cryptosystems 


7.41 Performance Assessment 


The performance of post-quantum digital signatures has been extensively studied 
in the literature. Such a performance evaluation has been considered with respect to 
several underlying hardware platforms, as well as, in several networking protocols 
with several assumptions on the underlying communication channel. In the case 
of FALCON, the authors measured its performance in terms of spent time instead 
of cycles. For Rainbow, the values indicate the performance of the key-compressed 
version that require much more computational effort than the regular version due 
to the involved decompression process. However, most cryptosystems have been 
evaluated after optimizing them for AVX2, a 256-bit instruction set provided by 
Intel. The only exception is the performance of SPHINCS for the HARAKA ver- 
sion, whose optimized version was implemented to take advantage of the AES-NI 
instruction set. 

It is interesting to point out that this performance evaluation presented in 
Table 7.4 is based on appropriate hardware that can be used for running both a 
regular blockchain node (i.e., a node that only interacts with the blockchain) or a 


122 Towards Post-Quantum Blockchain Platforms 


Table 7.4. An overall performance evaluation on post-quantum signatures being present 


in the 3rd round of NIST evaluation [19]. 


Scheme Algorithm Execution Time (ms) Size (Bits) 
Dilithium Dilithium II KeyGen = 0.18 K, = 22, 400 
Sign = 0.82 Kp =9, 472 
Ver = 0.16 o = 16,352 
Falcon Falcon-512 KeyGen = 16.77 K, = 10, 248 
Sign = 5.22 Kp =7, 176 
Ver = 0.05 o = 5, 52 
Rainbow Rainbow-Ia-Cyclic KeyGen = 0.48 Ks = 743, 680 
Sign = 0.34 Kp = 465, 152 
Ver = 0.83 co = 512 
GeMSS GeMSS128 KeyGen = 13.1 Ks = 107, 502 
Sign = 188 Kp = 2, 817, 504 
Ver = 0.03 o = 258 
Picnic Picnic-L1-FS KeyGen = 0.005 Ks = 128 
Sign = 4.09 Kp = 256 
Ver= 3.25 o = 272,256 
SPHINCS+ SPHINCS-+ — SHA256 — KeyGen = 2.95 K, =512 
128f — simple Sign = 93.37 Kp = 256 
Ver = 3.92 ao = 135, 808 


full blockchain node (i.e., a node that stores and updates periodically a copy of the 
blockchain and that is able to validate blockchain transactions). 

The conclusions derived can be summarized as follows: first, with respect to 
multivariate-based cryptosystems, MQDSS provides small keys, its lightest version 
is quite fast, but the sizes of its signatures are among the largest in the compar- 
ison (whereas other multivariate schemes have large sizes. In contrast, the rest 
of the compared multivariate-based schemes have keys with large sizes, but they 
generate short signatures; note also that MQDSS does not continue in the third 
round. 

Next, with respect to lattice-based signatures, they generally require smaller keys 
than the multivariate schemes, but they produce larger signatures. Amongst all 
of them, FALCON - which continues to the third round of the NIST compe- 
tition — makes use of the smallest key sizes and signature lengths. qTESLA is 
also fast, but its major drawback is the large key sizes; qTESLA is not present 
in the third round of evaluation in the NIST competition. The fastest scheme is 
Dilithium (amongst all the types of post-quantum signatures — not only amongst 
lattice-based). DILITHIUM obtains, in terms of performance, very similar results 
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Table 7.5. Time (ms) of key-pair generation, signing and verification [7]. 


Scheme KeyGen Sign Verify 
BPQS (w = 4, SHA256) 0.569 0.08 0.10 
BPQS (w = 4, SHA384) 1.107 0.16 0.19 
BPQS (w = 16, SHA256) 0.872 0.19 0.20 
BPQS (w = 16, SHA384) 1.719 0.39 0.38 


ECDSA SECP256K1 (SHA256) 0.10 0.34 0.25 
Pure EADSA Ed25519 (SHA512) 0.18 0.08 0.16 
RSA3072 (SHA256) 561.1 5.39 0.17 
SPHINCS-256 (SHA512) 0.69 144.5 1.76 


to ECDSA-256. Unfortunately, DILITHIUM key sizes are much larger than the 
ones used by ECDSA-256. 

However, apart from Dilithium, another option that achieves good perfor- 
mance is the lightest version of the Rainbow. This is also verified, apart from the 
aforementioned results in [10], in the evaluation over the TLS protocol [18]. Note 
also that Rainbow necessitates smaller parameters than Dilithium, thus rendering 
the algorithm a very strong candidate for future (including blockchain) applica- 
tions. Falcon provides the best verification time, but it is slow in signing. The slow- 
est digital signature algorithms are Picnic, GEMSS and SPHINCS (all of them are 
alternate algorithms in the NIST competition). 

In order to summarise the results (in terms of performance), we illustrate the 
performance results of the candidates (and the alternates) in the third round of 
NIST (see Table 7.4). This table is based on the results from [18], which are in 
fully compliance with the survey presented in [10]. 

As stated above, SPHINCS is generally a very slow signing algorithm. It is inter- 
esting to point out though that the BPQS, being also hash-based (and outside of the 
NIST competition) suffices to achieve better performance than SPHINCS, whereas 
it is blockchain oriented. This is illustrated in Table 7.5. It can be seen that, despite 
the relevant parameters of BPQS, it is much faster than SPHINCS in terms of sign- 
ing and verifying (with performance actually comparable to traditional public key 
digital signature schemes). The main drawback is the key generation time, which 
however is comparable, in some cases, with the SPHINCS. Regarding the signature 
size, all BPQS modes outperform XMSS for the first number of signatures. How- 
ever, BPQS signatures grow linearly with the number of times a key is reused and, 
thus the length of the signature output is dynamic (it starts small and increases per 
additional signature). 
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Table 7.6. Time for generating XMSS trees for a QRL wallet [20]. 


XMSS No. of 
Tree Height OTS Signatures Hash Function/Algorithm Gen. Time 


18 262.144 SHA2_256 / SHA2 1h 10min 49sec 
10 1.024 SHAKE_128 / SHA3 11sec 

12 4.096 SHA2_256/ SHA2 1h 20sec 

12 4.096 SHAKE_128/ SHA3 48sec 

12 4.096 SHAKE_256/ SHA3 46sec 


Table 7.7. Information on transactions in QRL [20]. 


Transaction Signing Signature Verification Block Block Size 


Size (Bytes) Time Size (Bytes) Time # (Bytes) 
2662 lsec 2500 4min 36sec 81188 2915 
2662 lsec 2500 9sec 81168 2915 
2662 lsec 2500 3min Osec 80944 2915 
2704 - 2500 — 80939 2958 
2662 lsec 2500 Imin 2sec 80205 2915 
2662 lsec 2500 24sec 66804 2915 
2705 - 2500 = 66739 2959 


It is also interesting to focus more carefully on XMSS, and especially on the 
QRL — which is a ledger supporting XMSS for achieving, by default, post-quantum 
security. It is known that XMSS has several limitations (and that’s why SPHINCS 
and BPQS are considered as improvements of XMSS); however, XMSS is indeed 
one cryptographic primitive that is currently used in a post-quantum secure com- 
mercial blockchain. 

We next present recent experimental results on QRL, aiming to see in prac- 
tice the performance of QRL (implementing XMSS) in a conventional worksta- 
tion [20]. The experiments have been conducted in an Intel Core2Duo E6750 @ 
2,66GHz processor, with 6 Gb RAM (DDR2 @ 400MHz) and Windows 10 Pro, 
64 bit, as an operating system. To perform several measurements, the researcher 
produced several different wallets with different parameters for the XMSS. The 
results are shown in Table 7.6. 

Moreover, the researcher in [20] proceed in performing several transactions in a 
testing environment (provided by the QRL), with the ultimate goal to see in prac- 
tice the corresponding signing and verification times. This is shown in Table 7.7, for 
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the second wallet. As it is shown in this table, the size of the signature is constant, 
which is expected since the size of the signature is related with the height of the 
XMSS tree (or, equivalently, with the number of the OTS signatures). More pre- 
cisely, in QRL the size of the signature is given by the relation 2180 + (height * 32) 
bytes. The variations in verification time are probably due to the load of the miner 
in the tested blockchain and the experiments tool placed. 


7.4.2 Attacks on PQC Primitives 


As NIST has stated the importance of side channel attacks (SCA) and countermea- 
sures. More precisely, in the original NIST PQC call for proposals in 2016, it was 
stated that “the Schemes that can be resistant to SCA at lower cost are more preferable 
than those whose performance is severely hampered by any attempt to resist side-channel 
attacks.” NIST also hopes to see implementations that will have protective mech- 
anisms against side-channel attacks, such as timing attacks, fault attacks, power 
monitoring attacks, etc. Therefore, in this section, it is presented a number of SCA 
and ISD attacks against the NIST PQC 3 round candidates. 
These attacks on the NIST’s 3 round candidates are categorized as: 


e Classical Cryptanalysis (CC), which mathematically analyses the correspond- 
ing cryptosystem. 

e Static Timing Analysis (STA), which manipulates variable runtime of an algo- 
rithm. 

e Fault Attacks (FA), which are semi-invasive techniques to deliberately induce 
faults and disclose cryptographic internal states. 

e Simple Power Analysis (SPA) and Advanced (differential/correlation) Power 
Analysis (APA), which non-invasively exploits the variations in the crypto- 
graphic algorithm’s power consumption. 

e Electromagnetic attacks (EMA), which exploit the radiation from a crypto- 
graphic algorithm. 

e Template attacks (TA) that use a sensitive device to obtain access to the secret. 

e Cold-boot attacks (CBA), which exploit the memory remanence to read data 
out of a computer’s memory when the computer has been turned off. 

© Countermeasures (CM) that protect/hinder attacks through masking or hid- 
ing techniques. 


Therefore, the next table (Table 7.8) presents which schemes are directly suscep- 
tible on the aforementioned attacks. 
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Table 7.8. A summary of attacks on NIST PQC 3rd round candidates. 
SCA 


Algorithm CC STA FA SPA APA EMA TA CBA CM 


Finalists KEMs Classic McEliece, Vv Vv Vv 
Kyber Vv Vv Vv 
NTRU Vv Vv 
Saber Vv V 
Signs Dilithium v V v 
Falcon y 
Rainbow V V v 
Alternatives KEMs BIKE VA 
FrodoKEM 
HQC V V 
NTRU Prime v v v 
SIKE vv 
Signs GeMSS V v 
Picnic v V 
SPHINC+ v 


x 
" 
Š 
Š 
a 
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7.5 Conclusions and Future Directions in PQC 
Blockchains 


This chapter considered the post-quantum security aspects in blockchain technol- 
ogy. More precisely, it has assessed contemporary PQC algorithms and the current 
situation of the NIST’s 3 round PQC candidates. In addition, it has presented 
the impact of quantum-computing attacks on blockchains and it has investigated 
the incorporation of PQC primitives in blockchains. 

Currently, quantum computing is an area that has gained a lot of interest from 
both the academia and the industry. Sequentially, new attacks might be devel- 
oped against the post-quantum cryptosystems. Therefore, it is necessary that both 
researchers and industry to be aware to the quantum computing area and its 
advances and for this reason, we present the challenges and the future directions 


in PQC blockchains. 
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7.5.1 Transitioning to Post-quantum Blockchains 


The transition to post-quantum blockchains necessitates the involved steps to be 
considered carefully. Therefore, several researchers have discovered new methods 
for the implementation of post-quantum security to the blockchain technology. 
For example, in [21] the authors introduced a scheme that extends the validity of 
the blockchain, if the security of the digital signatures or of the hash functions 
is imperiled. However, hard forks or smooth-forks might occur and for this case, 
the authors proposed a soft-fork mechanism [22]. In another work [23], a commit- 
delay—reveal protocol is proposed that enables the Bitcoin users to move funds from 
the non-quantum-resistant protocol to a version that adhere to a quantum-resistant 
signature scheme. This transition protocol can work well even if the ECDSA has 
been formerly compromised. 


7.5.2 Keys - Signature Sizes and Performance Challenges 


The key’s sizes in post-quantum cryptosystems are among 128 and 4,096 bits, 
meaning that the post-quantum cryptosystems demand key’s sizes much larger than 
the public key cryptosystems. Some signature cryptosystems, which are based on 
supersingular isogenies, appear to be auspicious to solve the key size issue, but such 
schemes generate large signatures and provide pour performance compared to the 
public key cryptosystems. As one issue is seemingly solved several others are cre- 
ated, since the blockchains store a vast number of signatures. In a similar way, the 
hashed-based cryptosystems have comparatively small key sizes, which comes to 
contradiction with the size of their signatures, which is often more than 40 KB. 
On the other hand, the majority of the multivariate-based cryptosystems generate 
short signatures, but the keys used for their generation and verification might need 
several kilobytes. The lattice cryptosystems, which are based on DILITHIUM are 
very fast, but their signature length is 2701 bytes and their key size is approximately 
1500 bytes. 

The post-quantum cryptosystems need a considerable amount of (a) execution 
time, (b) computational and (c) storage resources. To some extent, some schemes 
reduce the number of the signed messages with the same key. This practice results 
to the generation of new keys repeatedly and to the dedication of the computa- 
tional resources for this purpose that could be otherwise used for certain blockchain 
processes. Nevertheless, the current research in post-quantum cryptosystems is not 
adequate for having a good trade-off among the size of the keys and the scheme’s 
performance for the blockchains. Therefore, novel approaches are required, which 
will minimize the cryptosystems’ energy consumption and therefore, the perfor- 
mance of the blockchain network. 
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7.5.3 General Directions 


A large distributed network, such as the blockchain, necessitates exceptional con- 
sideration when migrating to a post-quantum cryptography, due to the limitations 
of the downtime and the synchronous update. Such transitions require not only 
performance assurance and backwards compatibility, but also slow rollouts and 
rollbacks. Therefore, a post-quantum implementation of a blockchain network 
requires the following steps: 


I. Software rollout: A slow rollout of the software to all the network’s peers. 
This migration should be backwards compatible, with the nodes to be able 
to continuously sign and verify signatures, as well as, to validate X.509 cer- 
tificates classically until they change to a post-quantum mode. 

II. Key rollover: While the certificate authority will be modified with a post- 
quantum key, the node certificates should be re-issued following a key 
rollover method. 

HI. Slow rollout of the PQC keys: When the key-pairs of post-quantum keys 
will be generated, the configuration files of each node that belongs to the 
network should be updated. 

IV. The final step will be the rollout of post quantum keys to the client peers. 


Therefore, all the above steps should be taken into consideration when imple- 
menting post-quantum digital signatures or encryption algorithms to a blockchain 
platform. 
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The Internet of Things has enabled the interconnection of billions of devices, which 
cooperate to support a large number of applications and application features. In this 
context, the number of the devices that need to interact to realize the desired func- 
tionalities has substantially grown, and this has rendered traditional access control 
methods hard to manage and ineffective. To respond to this challenge, trust-based 
access control has emerged, where each device is assigned a level of trust, and this 
level is consulted to determine whether data and operation accesses should be per- 
mitted or declined. In this chapter, we propose an approach to trust computation 
in the Internet of things, which synthesizes behavioral, device status and associ- 
ated risk aspects into a comprehensive trust score, that can be consulted to realize 
trust-based access control. The proposed approach also considers device ownership 
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relationships and owner-to-owner trust relationships, which are utilized in the trust 
computation process. 


8.1 Introduction: Background and Driving Forces 


In the context of computing, parties interact with each other to access services and 
information. Traditionally, access control mechanisms are employed to safeguard 
such accesses: authentication mechanisms provide the necessary guarantees about 
the identities of the interacting parties (i.e., that either the service/information 
requestor or the server are indeed who they claim they are), whereas authorization 
mechanisms enforce information/service access policies, ensuring that only autho- 
rized clients can access the information/service resources provided by the servers. 
While this approach is adequate for a number of information system use cases, and 
predominantly in client-server systems where a closed set of clients or client groups 
interact with a limited set of servers that are known a priori, modern internet-scale 
computing necessitates the interaction between unknown parties, with each party 
being able both to request and offer services and/or information. In such an envi- 
ronment, traditional access control systems are deemed insufficient, since interact- 
ing parties are highly likely to be unknown to each other before the beginning of 
the interaction. In this respect, a different approach is needed to allow interacting 
parties to decide: 


1. Whether the requestor is entitled to access the service/information 
requested and 

2. Whether the provider is trusted as a source of the particular ser- 
vice/information. 


To address the issues listed above, the concept Trust management has been intro- 
duced. The authors in [1] define trust management as an underpinning that facili- 
tates the enforcement of security policies by verifying actions against these policies, 
in an automated fashion. Following this definition, the execution of an action is 
permitted if the interacting party has provided credentials that are assessed to be 
sufficient; if this holds, the interacting party’s actual identity need not be known 
or verified. In other words, the checks made need only to process and verify some 
symbolic representation of the requesting party’s trust level, which is now clearly 
distinguished from the requesting party itself (a person or an agent acting on 
behalf of the person). To further promote the benefits of the trust-based approach, 
the presentation and validation of credentials can be replaced by the inspection 
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and assessment of a set of properties, which are testified for and validated by some 
interacting party, while digital certificates are used to represent the aforementioned 
properties and safeguard their validity [2—4]. 

Following this rationale, the initial collection of trust management system ele- 
ments listed in [4] is revised as described below: 


1. Security policies, which comprise a group of trust assertions that are regarded 
as “ground truth” and are therefore trusted in all cases. 

2. Trust-related properties, which represent characteristics of communicating 
parties that are pertinent to the enforcement of security policies; typically, 
such properties are examined as antecedents of rules that comprise a secu- 
rity policy. Trust-related policies are safeguarded through digital signatures 
or other prominent means. 

3. Trust relationships, which are a special kind security policy. 


While the scheme presented above explicitly lists two interacting parties, i.e., the 
service/information requestor and server, trust establishment may involve more par- 
ties, resulting in a highly decentralized model: firstly, trust-related properties may 
be (and typically are) provided and testified for by third parties. Secondly, trust rela- 
tionships may designate other trust management system entities with which a trust 
management system instance liaises to exchange any of the system elements listed 
above (security policies, trust-related properties or trust relationships), including 
also trust assessments that can be taken into account when a trust management 
system instance assesses the trust level of an interacting party. 

The trust level ofan interaction peer may be computed by taking into account all 
its observable characteristics: this includes (a) the security characteristics of the interac- 
tion peer, along with the current evaluation of the peer’s integrity assessment (possi- 
ble compromise of firmware, operating system, system files; security patch version; 
etc.) and security defenses employed by the device (firewalls; IDS/IPS; etc. [5]) and 
(b) behavioral characteristics of the interaction peer, relating to whether the interac- 
tion peer (i) functions in compliance to its predefined usage description and (ii) 
exhibits abnormal behavior. 

Services, information and resources are actually assets which hold a value for their 
respective owners and thus necessitate protection through trust management or 
other pertinent means. Protection aims to safeguard assets from a number of threats, 
which manifest risks against them, and may ultimately lead to the demotion of their 
value [5]. As a result, the process of protecting the assets must incorporate a risk 
assessment of each interaction, and the choice and application of the appropriate 
defensive measures as dictated by the assessment’s results. This is in line with the 
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procedure described in the ISO/IEC 27001 standard [6] for addressing risks, which 
encompasses the following two steps: 


1. information security risk assessment, which is further refined in (i) establish- 
ment and maintenance of information security risk criteria that include the 
risk acceptance criteria (ii) identification of information risks and (iii) anal- 
ysis of information security risks and (iv) evaluation of information security 
risks and 

2. information security risk treatment, where (i) suitable options for mitigating 
information security risks are chosen, after considering the outcomes of risk 
assessment, (ii) appropriate controls for the realization of the chosen security 
risk treatment options are chosen, taking also into account the cost/benefit 
ratio of applying the chosen security risk treatment options and (iii) the 
information security risk treatment approach is validated, after reviewing any 
residual information security risks and knowledgeably accepting their pres- 
ence (or returning to the step of choosing appropriate controls). 


Trust and risk assessment are two closely associated concepts, following the ratio- 
nale that the evaluation of information security risks involves a calculation of the 
probability that the risks in question will occur [6], and the result of this calcu- 
lation is dependent on the trust level that is assigned to systems that could prove 
to be threat agents. This rationale is reflected on definitions of trust found in the 
literature: according to [7] “Trust is the willingness of a party to be vulnerable to 
the action of another party based on the expectation that the other will perform 
a particular action important to the trustor, irrespective to the ability to monitor 
or control that other party”; on the same note, [8] defines trust as “An attitude of 
confident expectation in an online situation of risk that one’s vulnerabilities will 
not be exploited”. These lead us to the conclusion that trust reduces the level of 
risk, based on the conviction that a trusted system will not ultimately operate as a 
threat agent. Overall, a system’s trust assessment must be incorporated as a critical 
parameter of a risk assessment. 

Finally, attackers are increasingly employing more complex attack methods 
which include multi-stage, multi-host attack paths, with each path representing 
a series of exploits utilized by the attacker to compromise a network [10]. To this 
end, attack graphs can be employed to perform a comprehensive risk analysis of a 
network, by taking into account the cause-consequence relationships involved in a 
network’s shifting states. Furthermore, the probability of the exploitation of such 
relationships can also be considered [9]. 

In this chapter, we firstly overview existing trust- and risk-based approaches to 
security, and identify areas of improvement, with a special focus on the domain of 
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the internet of things. Subsequently, we present an approach for trust computation, 
which synthesizes different aspects into a single, comprehensive trust score that can 
be used for applying trust-based access control. We also describe an architecture for 
realizing the proposed approach. 


8.2 Fundamentals of Trust Management 


In this section we will overview the three main foundations of trust and risk man- 
agement namely (a) behavioral-based methods, focusing on the observed interactions 
of the devices, (b) status-based methods, focusing on the devices’ security aspects and 
(c) risk assessment-oriented methods, focusing on the quantification of the risk associ- 
ated with the devices and operations. For each of the three foundations, we present 
methods, tools and information sources that can be employed for realizing trust 
and risk management in the relevant context. 


8.2.1 Behavioral Aspects 


The behavior of a device can be monitored and used in the process of trust and risk 
assessment. The term “behavior” in this context refers to the observable activities 
performed by the device, and this predominantly includes network traffic directed 
towards other nodes. This network traffic can be: 


© Compared against a predefined static model of behavior that has been specified 
for the device and prescribes the operation of a benign instance of the device. 
Deviations from the prescribed behavior are then treated as indications of 
malicious behavior and demote the trust level, increasing correspondingly 
the risk level. Manufacturer Usage Description Specification files [11] are the 
main tool in this area. 

© Compared against a dynamically built model of behavior for the device; under 
this approach, the behavior of the device instance is profiled at a state that 
is known to be benign, and further behavior is compared against the base- 
line within the profile. Deviations from the baseline are flagged as anoma- 
lies, reducing the trust level and increasing the associated risk. Provisions for 
dynamic evolution of the profile can be made. 

© Matched against a known set of malicious requests. Under this approach, the 
network traffic emanating from the device is matched against a malicious 
requests signature database, to identify whether the device is the source 
of attacks to other devices; if so, it can be concluded that the device has 
been compromised, and consequently trust and risk assessments are adjusted 
accordingly. 
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Another aspect that can be taken into account at this point concerns the observ- 
able consequences of information flows, rather than the information flows them- 
selves. Under this viewpoint, information that has leaked from a device (e.g., user 
passwords or personal data) constitutes evidence that the device does not provide an 
adequate level of security (including the case that it discloses information to entities 
that should not be trusted), and on these grounds the trust level to this device is 
reduced. 


8.2.2 Status-based Approaches 


Status-based approaches to trust and risk assessment examine the current state of the 
interacting device, regarding its security aspects. The goal is to determine whether 
(a) a breach has already been made to the device, having resulted in tampering of 
either software or its configuration and (b) how prone the device is to breaches, in 
the sense that known vulnerabilities have not been appropriately and timely han- 
dled through installation of patches. The security controls that apply to the device, 
are also taken into account since they moderate the device’s vulnerability levels. In 
more detail, the following aspects are considered in status-based approaches: 


e Have critical files been tampered with? Relevant validations span across: 


the device's firmware; 
the operating system and other software; 


O O Oo 


the system/network config files; 
o the audit and event logs. 


e Have the latest patches been installed? Missing patches increase the vulnera- 
bility level of the device and therefore demote the trust level. 
e Which security controls are in effect to protect the device? 


8.2.3 Risk Assessment 


Nowadays, the security of, and trust placed on, digital systems have become an ever- 
growing concern as technology plays an increasingly important role in our societies. 
An important manifestation of this aspect is the abundance of attacks deployed 
against organizations, governmental bodies and the society [12]. The mitigation 
of such attacks traditionally entails cybersecurity risk assessments which aid in the 
identification of critical assets, the threats they are exposed to, the probability of a 
successful attack, and the potential consequences. This approach, along with the 
prioritization of the identified risks, is the only way to identify the appropriate 
measures to be applied [12]. 
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Risk assessment encompasses the identification, estimation and prioritization 
of the risks linked to an organization’s assets and operations. This activity plays a 
critical role in the context of risk management, by providing the basis for the treat- 
ment of identified risks. The possible treatment approaches are: risk acceptance — 
when the risk level is deemed acceptable after consideration of the organizations 
risk management policy; risk mitigation — through security controls; risk transfer — 
by delegating accountability to an insurance company; or risk avoidance — through 
the removal of the corresponding asset. Some of the core concepts of risk assessment 
include but are not limited to: assets, vulnerabilities, threats, attack likelihood, and 
impact [13]. 

An asset can be any item that holds value for an organization, and is charac- 
terized by several properties. Assets can be classified as tangible (e.g., hardware) or 
intangible (e.g. public image of a business); additionally, assets can be a constituent 
part of a system or be the entire system. Vulnerabilities are properties of the assets 
that can be exploited, and can be defined as weaknesses of the assets themselves 
or weaknesses of the controls that protect them. A threat is an action that could 
compromise an asset, and is usually associated with the exploitation of a vulner- 
ability. A threat can occur deliberately (e.g., applying a brute force attack to find 
the administrator’s password) or unintentionally (e.g., erase a file through an erro- 
neous action). These concepts are combined in the term cyber-risk which defines 
the probability of a successful threat (attack) emerging and the consequences for 
the assets involved.’ 


8.3 Trust Management Systems 


Trust management models target at enabling nodes that participate in the trust 
management system to determine a trust metric value for other nodes within the 
system. Approaches to how trust models approach trust computation vary regarding 
numerous aspects, including the input used to compute trust, the way that trust 
values are updated, the consensus sought for trust value computation, the scale at 
which trust is measured, their resilience against attacks and so forth. Furthermore, 
trust management models vary with respect to architectural paradigm they follow, 
i.e., the way that the components participating in the trust management system are 
deployed in the target network, the relationships between the components and the 
information flows. 

In the following subsections we survey existing trust models and their architec- 
tures, commenting on their merits and demerits. 


1. https://www.thebalancesmb.com/assets-definition-2947887 
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8.3.1 Review of Existing Trust Models 


This section overviews the trust models that have been proposed by the litera- 
ture trying to find an effective and efficient trust computation method. In service- 
oriented networks, an IoT device acting as a service requester needs a way of eval- 
uating which of its peers can be trusted to provide it with the requested service, 
while taking into consideration the energy demands of carrying out such evalu- 
ation. This is the challenge that trust management models are aiming to solve. 
We present trust management models as seen in the literature and we categorize 
each model by trust dimensions, resiliency against certain attacks and qualitative 
characteristics. 


8.3.1.1 Trust dimensions 


Trust models are composed of several trust dimensions which can vary between 
them depending on the approach followed. In this section we present the five 
most essential trust dimensions, namely, trust composition, trust propagation, trust 
aggregation, trust update and trust formation [14]. 


Trust composition. Refers to the components the model in question takes into 
account. The components are Quality of Service (QoS) and Social trust. 


© QoS trust refers to the trust level assigned to a node based on the evalua- 
tion of its competence in delivering the requested service. It is considered as 
the “objective” evaluation of trust. In order to compute QoS trust, models 
use various trust properties including competence, cooperativeness, reliabil- 
ity, task completion etc. 

e Social trust refers to the social relationship between owners of IoT devices. 
Social trust is used in systems where IoT devices must not be evaluated only 
on a QoS basis but also on a social basis, which is the device’s commitment 
and willingness to cooperate. It can also be derived from similarity of devices. 
Social trust properties include connectivity, honesty, unselfishness etc. 


Trust propagation. Refers to the way trust values are disseminated between entities. 
In general, there are two approaches, namely distributed and centralized. 


e In distributed trust propagation each device acts autonomously by storing 
trust values and disseminating them as recommendations to other devices as 
needed. 

e In centralized trust propagation a central entity exists, which is responsible 
for storing trust values of the monitored network and disseminating them as 


needed. 
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Trust aggregation. Refers to the computation techniques used by a model to 
combine trust obtained from direct observation with indirect trust coming from 
recommendations. Main aggregation techniques include weighted sum, Bayesian 
inference, and fuzzy logic. 


e Weighted sum is a technique where weights are assigned on the participating 
values either statically either dynamically. For example, one model could use 
a trust property, e.g., competence, in order to assign higher or lower weights. 

e Bayesian inference considers trust to be a random variable which follows a 
probability distribution. It is a simple and statistically sound model. 

e Fuzzy logic uses approximate reasoning meaning that it doesn’t use a binary 
evaluation variable but rather a variable whose values range between 0 and 1 
for example, or even linguistic limits like High and Low which are translated 
using a membership function. 


Trust update. Describes when trust values are updated. There are two approaches: 
event-driven and time-driven. 


e Event-driven is the approach in which trust values are updated when an event 
occurs. 
e Time-driven is the approach in which trust values are update periodically. 


Trust formation. Refers to how the overall trust is formed out of the trust properties 
considered. Trust can be formed by considering only one trust property (Single- 
trust) or many properties (Multi-trust). 


e Single-trust is when only one property is taken into consideration when com- 
puting trust and it is usually a property of QoS. It is considered as a narrow 
approach because trust is multi-dimensional, but it is useful in cases with 
limited resources. 

e Multi-trust is the multi-dimensional approach in computing trust, because it 
uses more than one trust properties to form the overall trust evaluation of a 
device. 


8.3.2 Trust Management Models 


In this section we survey the different trust models proposed in the literature. For 
each model, the approach adopted for trust computation is presented, with an 
overview given in Table 8.1 while salient features of the models presented in detail 
in [31] (Table 3.5). 

Bao, 2012 [17]. This model is proposed for social loT(SIoT) systems based 
on Community of Interest (Col). A device has a single owner and an owner can 
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Table 8.1. Overview of different trust models. 


Composition Propagation Aggregation Update Formation 


Model QoS Social Distrib Central Weigh Fuzzy Bayes E/T Sin Mul 


[15-18] X X X X E/T X 
[19,20] X X X X X ET X 
[21] X X X X T X 
[22] X X X X T X 
[23] X X X E X 
[24] X X X T X 
[25] X X X X X E X 
[26] X X X ET X 
[26] X X X E X 
[27] X X X T X 
[28] X X T X 
[29] X X X X E X 
[30] X X X X E X 


have multiple devices. The owners reserve a list with friends. Nodes that are part of 
similar communities have a better chance of having similar interests and capabili- 
ties. The authors consider both QoS and Social trust composition and define three 
trust properties: community-interest (Social), cooperativeness (QoS), and honesty 
(QoS);the interested reader is referred to [31] (Table 3.5) for more details. The trust 
value is a real number in the range [0,1] where 1 indicates complete trust, 0.5 igno- 
rance, and 0 distrust. The trust values are calculated by taking into account direct 
observations; in case such direct observations aren’t any available, trust values can 
ve sourced from recommendations. Trust aggregation is performed using weighted 
sums, while the model follows a distributed architecture. It is worth mentioning 
that the weights that were used for past experiences can be dynamically adjusted 
when new evidence occurs to rebalance the trust convergence rate and trust fluctua- 
tion rate. In the simulation results, the effect that changing weights have is observed, 
but a way to dynamically adjust them is not mentioned. 

Chen, 2016a [18]. This model is very similar to Bao, 2012. Main differences 
include: 1. A general approach for the computation of overall trust is not discussed. 
Instead, overall trust computation for specific scenarios is discussed. 2. The friends 
(nodes) lists exchanged between nodes upon interaction are encrypted with a one- 
way function in a way that nodes can identify only common friends. Hashing is 
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cost-efficient. 3. The model is tested in two real-world scenarios, namely, “Smart 
City Air Pollution Detection” and “Augmented Map Travel Assistance”. 

Bao, 2013 [19]. This model is proposed for social oT (SIoT) systems based on 
the Community of Interest (Col) concept. A device can have only one owner and an 
owner can have multiple devices. Owners maintain personal friend lists. Nodes that 
are part of similar communities have a higher probability of sharing similar inter- 
ests and capabilities. The authors consider both QoS and Social trust composition. 
The trust value is a real number in the range [0,1] where 1 indicates complete trust, 
0.5 ignorance, and 0 distrust. The trust properties considered are honesty, cooper- 
ativeness and community-interest; please refer to [31] (Table 3.5) for more details. 
The trust propagation is distributed. The models’ trust aggregation scheme uses 
Bayesian inference for the calculation of direct trust, and weighted sums are used 
for the aggregation of recommendation into indirect trust. An important aspect of 
this model is the introduction of a novel strategy for storage management which 
can be efficiently applied to large-scale IoT systems. 

Chen, 2016b [20]. This model is an extension of Bao, 2013 [19]. Exten- 
sions include: 1. In the evaluation of recommenders, it introduces two addi- 
tional properties, namely, friendship and social contact, which are further analyzed 
in [31] (Table 3.5). In trust aggregation it combines the direct with the indirect 
trust to form the overall trust. 3. Its simulations outperform EigenTrust [32] and 
PeerTrust [33] in trust convergence, accuracy, and attacks resiliency. 

Chen, 2011 [21]. This model considers only QoS metrics for evaluating trust, 
namely, end-to-end packet forwarding ratio (EPFR), energy consumption (EC), 
and package delivery ratio (PDR). Each node maintains a data forwarding transac- 
tion table which includes the values: (1) Source: the trust and evaluation evaluating 
nodes, (2) Destination: the evaluated destination nodes, (3) RF;,j: the times of suc- 
cessful transactions made between nodes i and j, and (4) F; j: positive transactions. 
It follows a distributed scheme in terms of trust propagation. In trust aggregation, 
a fuzzy trust model is used, and the overall trust is formed using a weighted sum of 
direct and indirect trust based on recommendations. The direct trust is computed 
by first aggregating the aforementioned QoS metrics, then labeling the results as a 
positive or negative experience based on a threshold and then a fuzzy membership 
function computes the direct trust based on the number of positive and negative 
experiences. Additionally, the model was tested on simulations and achieved better 
performance from BITRM-WSN [34] and DRBTS [35] in both packet delivery 
ratio and detection probability of malicious nodes. 

Mahalle, 2013 [22]. This model considers three QoS metrics: Experience (EX), 
Knowledge (KN) and Recommendation (RC) ratings. It follows a distributed 
scheme, as every device considers the ratings of its neighbors for the calculation 
of the trust score. Trust is calculated periodically using Mamdani-type fuzzy rules 
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(representing If-Then relationships between their input variables) from the linguis- 
tic values of the three aforementioned metrics. Trust scores (as linguistic values) are 
then mapped to a set of access control permissions. Experience (EX) is the weighted 
sum ofa number of previous interaction ratings between two devices (+1 for a suc- 
cessful interaction and —1 for an unsuccessful interaction), Knowledge (KN) is the 
weighted sum of direct and indirect knowledge ratings, and Recommendation (RC) 
is the weighted sum of RC ratings from a number of devices about the device to be 
trusted. The three metrics are mapped to their linguistic variables using predefined 
numeric (crisp) ranges. The model was tested in a simulated environment of wire- 
less sensors with communication between sensors being controlled by trust ratings, 
resulting in more energy efficient communications, and proving to be scalable. 

Prajapati, 2013 [27]. This model proposes the forming of trust values based 
on how satisfactory was a node’s response to requests for specific services that were 
made to it: these satisfaction quantifications are combined to form the Direct Trust 
value. If a Direct Trust value is available, then this value is used; in the absence 
of a Direct Trust value, the Recommended Trust value is computed by sourcing 
and aggregating trust assessments from other peer nodes. In case the target node 
is joining the cloud for the first time, and therefore neither Direct Trust nor Rec- 
ommended Trust values for it are available, a predefined Ignorance Value is used. 
Direct Trust is defined as the weighted sum of the rated service satisfaction rat- 
ings over time (with the weights decreasing over time, thus favoring newer ratings). 
Recommended Trust is defined as the weighted sum of the Direct Trust values of 
the other nodes. The weights used in the calculation of each Direct Trust value are 
based on two factors. The first one is the number of positive interactions between 
the two nodes (trustor and trustee). The second one is the Satisfaction Level which 
depends on factors such as recovery time, maximum-load performance, connectiv- 
ity and availability as provisioned by the service agreement. 

All nodes maintain a Direct Trust Table and a Recommended Trust Table con- 
taining the respective trust values with both tables being updated periodically. This 
model follows a distributed model as in the case of Recommended Trust, the trust 
values of all network nodes are considered. 

Saied, 2013 [26]. This model considers ratings given to a specific node and ser- 
vice at a given time while also taking into consideration its state (e.g., age, resource 
capacity, etc.) It follows a centralized scheme with a Trust Manager (TM) node 
receiving reports from the network and calculating the trust values on demand. This 
leads to reduced communication overheads — since trust values are calculated and 
transmitted on demand, less memory usage for each node — since the trust values 
can be requested again from TM, and thus being energy efficient. The model oper- 
ates in five phases: (1) TM receives reports from the network nodes, (2) TM calcu- 
lates the trust values of a number of candidate nodes and sends a list of trustworthy 
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nodes to the requesting node, (3) the requesting node receives the list and interacts 
with a chosen trustworthy node, (4) the requesting node rates the service provided 
by the chosen trustworthy node and sends the rating to the TM, and finally (5) 
TM updates its trust values accordingly. Trust is calculated as the weighted aver- 
age of the scores given to a node while taking into consideration the reputation of 
the node providing the score, the contextual similarity of all the reports concern- 
ing the same node, and the age of the report — favoring the most recent reports. 
Contextual similarity is calculated from the node capabilities between two nodes — 
to locate similar nodes, and/or from the difference of required resources between 
two services — to locate nodes able to run a similar service. Initially all nodes of the 
network are deemed trustworthy. 

Mendoza, 2015 [23]. This model is a distributed version of the model proposed 
by Saied et al. [26]. It is noted that centralized schemes may not be suitable for IoT 
systems as server installation and server costs may be prohibitive. The rating scheme 
of this model defines ratings for a specific node and service. The model’s operation 
comprises three phases: (1) nodes announce their presence to their neighbors and 
maintain a list of neighbors, (2) nodes request services from their neighbors and rate 
the interaction positively or negatively, and (3) nodes calculate and save trust values 
for their neighbors, based on these interactions. The response rating is defined as 
the fixed value of the provided service weighted by an adjusting factor, with the 
negative response rating being equal to two times the positive response rating. The 
provided service value is proportional to the processing requirements of the service, 
as more processing power or energy is required to runa service the higher the service 
value will be. The trust value of a node is calculated as the sum of all interaction 
ratings. The model was tested against On-Off Attacks (OOA) and it is noted that 
a large number of neighbors can cause delays in the assignment of the maximum 
distrust score to the malicious nodes. 

Namal, 2015 [24]. This model considers four parameters: availability of 
resources to its users, reliability of produced information, response time irregu- 
larities, and capacity. It follows a centralized scheme with a Trust Manager (TM) 
module, hosted on the cloud, receiving filtered data from Trust Agents (TA) dis- 
tributed on the network which in turn receive raw data and monitor the state of the 
network nodes. The TM implements a Monitor, Analyze, Plan, Execute, Knowl- 
edge (MAPE-K) feedback control loop and calculates the trust using the weighted 
sum of the trust parameters for all parameters considered. The trust parameter is 
also a weighted sum of the current value and the previous value calculated. This 
model shows advantages in: availability and accessibility — as the TMS is hosted 
on the cloud and is accessible from the internet, scalability — as the TMS utilizes 
TAs filtering the raw data, and flexibility — as the TAs can be deployed in a flexible 


manner. 
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Khan, 2017 [26]. This model considers ratings given to a node by its neighbors, 
these ratings are the combination of three variables: belief, disbelief and uncer- 
tainty — as defined in Josang’s Subjective Logic. This model is proposed as part 
of an extension of the RPL routing protocol utilizing the proposed model to iso- 
late malicious nodes. It follows a centralized scheme with a central node (e.g., RPL 
border router or cluster-head) calculating trust values for all network nodes and 
deciding to isolate malicious nodes. Each node of the network is assumed to be 
able to detect and therefore rate the performance of its neighboring nodes; each 
of the three aforementioned variables is defined as follows: belief is the number 
of positive interactions divided by the total number of interactions & a constant 
k, disbelief is defined similarly but instead of the positive interactions the number 
of negative interactions is used, and uncertainty is also defined similarity but with 
the constant k used instead of the number of positive/negative interactions. The 
central node calculates the trust value of each network node by combination of the 
trust values regarding the node to be trusted and using a threshold the central node 
isolates malicious nodes from the network. 

Djedjig, 2017b [36]. This model considers two QoS parameters: selfishness 
and energy, and one social parameter: honesty as ratings given about a node from 
its neighbors. This model is a proposed extension of the RPL routing protocol, as in 
Khan etal. [21], to isolate malicious nodes. It follows a distributed scheme with each 
node calculating the trust values of its one-hop neighbors while also considering the 
trust values of its one-hop neighbors. Trust calculation is performed as follows: (1) 
each node calculates the direct trust values of its one-hop neighbors as a weighted 
sum of the honesty, energy and unselfishness metrics (definitions of which are not 
discussed in detail) with each metric being the weighted sum of the current value 
of the metric and the previous value of the metric, (2) each node receives the direct 
trust values calculated by its one-hop neighbors concerning the node to be rated, 
and (3) the indirect trust is then calculated by each node as the average of the direct 
trust calculated by the node itself and its neighbors. All nodes are assumed to be 
equipped with Trusted Platform Module (TPM) chips. 

Medjek, 2017 [14]. This model is based on the one proposed by Djedjig 
et al. [36] with the difference in the metrics considered: honesty, energy and mobil- 
ity. The main difference is the network architecture as this model applies to RPL 
networks consisting of a Backbone Router (BR) that federates multiple 6Lo WPAN 
networks, each consisting of a 6LoWPAN Border Router (6BR) connected to the 
BR and the rest of the network nodes. This model follows a distributed scheme with 
each network node calculating the trust of its one-hop neighbors, as in [36], with 
the added steps of notifying its 6BR if a node is found to be untrustworthy and with 
the 6BR in turn notifying the BR of the malicious node. All nodes are assumed to 
be equipped with a Trusted Platform Module (TPM) and all nodes are registered 
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with the BR at installation time, with every node having a unique ID assigned by 
the BR. Several lists are maintained by the various network nodes; the BR maintains 
two lists: one of potential malicious nodes and one of all nodes and their states; the 
6BR maintains three lists: one of all 6BR area nodes, one of all the mobile nodes, 
and one of the potential malicious nodes; finally the remaining nodes also maintain 
three lists: one of potential malicious nodes, one of suspicious nodes and a copy of 
the mobile node list from the 6BR. Three modules operate on the various network 
nodes: IdentityMod controls access to the network and ensures that every node has 
a unique ID, MobilityMod ensures that both the BR and the 6BRs are aware of 
mobile nodes and of their status, and IDSMod is responsible for attack detection 
and mitigation. Trust is calculated in a similar fashion to [36] with the values of 
the honesty metric supplied by the IDSMod and the values of the mobility metric 
supplied by the MobilityMod; the three metrics are not discussed in detail. 

Nitti, 2014 [25]. This work proposes two models, namely the “subjective” 
model and the “objective” one. These models consider the following parameters: 
(i) node credibility, (ii) service ratings, (iii) transaction factor — identifying which 
transactions are important to avoid trust levels increasing only by many small trans- 
actions, (iv) number of transactions per node — to detect abnormalities in the num- 
ber of transactions for a given node, (v) computation capacity — nodes with higher 
computational capabilities can inflict more damage if they are malicious, (vi) the 
notion of centrality — a node plays a more central role if involved in many connec- 
tions or transactions in the network, and (vii) the relationship factor — considering 
the type of two nodes’ relationship. 

The subjective model follows a distributed scheme where each node stores the 
necessary information to calculate the trust values locally. Two situations are covered 
relating to the social relationship between nodes: when the rating node has a social 
relationship with the rated node and when the two nodes have no direct social 
relationship. In the first situation trust depends: on the centrality of the rated node 
in relation to the rating node — by count of the common friends out of all the 
neighboring nodes, the direct experience of the rating node — further defined as 
the weighted sum of both short-term and long-term opinions, and the indirect 
experience of the rating node’s friends — defined as the weighted average of the 
trust values assigned to the rated node by the rating node’s friends, weighted by 
their credibility. In the second situation trust depends: on the opinions of the chain 
of common friends connecting the two nodes, again weighted by their credibility. 
Generally, after each transaction a rating (positive/negative) is given to the node 
providing the service and to the nodes whose opinion was considered in calculating 
the trust value. Negative recommendation ratings are given to both malicious nodes 
and to nodes in their neighborhood, thus isolating the malicious nodes and their 
influence further. 
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The objective model follows a more centralized scheme where each node reports 
its feedback to special nodes, referred to as Pre-Trusted Objects (PTO), responsi- 
ble solely for maintaining the distributed storage system, in this case a Distributed 
Hash Table (DHT) and more specifically one following the Chord architecture. 
Trust is calculated in a similar fashion as in the subjective model; node centrality 
is defined as the total number of transactions performed by the node to provide 
a service divided by the total number of transactions performed to either provide 
or request a service, and both short-term and long-term opinions consider the rat- 
ings of every network node weighted by their credibility. Nodes with few social 
relations, high computation capabilities and nodes involved in a large number of 
transactions between them are assigned low credibility, as they are more likely to 
become malicious. 

Wa, 2017 [28]. The system model consists of four entities with three trust rela- 
tionships among them. The four entities are defined: RFID tags, RFID readers, 
authentication centers and one administration center, with the first three being 
grouped in domains. A domain has multiple RFID readers connected with the 
domain authentication center which authorizes the readers to interact with the 
RFID tags, and the domain authentication centers are connected with the admin- 
istration center. The trust relationships of this system model are defined as: intra- 
domain trust — trust relationship between RFID tags and readers of the same 
domain, inter-domain trust — trust relationship between authentication centers, 
and cross-domain trust — trust relationship between RFID tags and readers belong- 
ing to different domains. 

The trust management model consists of two layers: the authentication center 
trust layer — a centralized trust management system managing the trustworthiness of 
authentication centers, and the reader trust layer — two proposed trust management 
schemes managing the trustworthiness of RFID readers. The RFID tags are always 
assumed to be trusted. 

The first reader trust management layer scheme proposed uses the Dempster- 
Shafer evidence theory and consists of four steps: (1) the interaction of an RFID 
reader is recorded by its neighbors, (2) the neighbors calculate the local trust values 
which are then transmitted to the authentication center, (3) the authentication cen- 
ter calculates the global trust of the RFID reader by using the Dempster knowledge 
rule, and finally (4) it the RFID reader is malicious or malfunctioning the admin- 
istration center is notified. Possible RFID reader interaction events are identified 
and marked as: malicious behavior, malfunctioning behavior and normal behavior 
by the neighboring RFID readers, each counting the number of events within a 
specified time frame. Using the number of recorded events the neighboring RFID 
readers can calculate the local trust value for each type of interaction events as: the 
number of events marked as malicious/malfunctioning/normal divided by the total 
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number of recorded events. The final value of the local trust value is then chosen 
from the event-specific local trust values using a threshold. The authentication cen- 
ter calculates the global trust of the RFID reader by aggregating the event-specific 
local trust scores calculated by the neighboring RFID readers and then choosing 
the final integrated event-specific score using a threshold. 

The second reader trust management layer scheme proposed considers the fact 
that events may not be detected by neighbors of the RFID reader and thus the 
first reader trust management layer scheme may not be applicable to certain situ- 
ations. Each RFID tag keeps record of the last interaction with an RFID reader, 
more specifically the RFID reader ID, a timestamp and the rating assigned to the 
RFID reader by the tag. This record is sent at the next time the RFID tag interacts 
with any RFID reader (and is then deleted from the RFID tag), with the RFID 
reader forwarding the record to its authentication center which checks for abnor- 
malities and if any problem arises, it notifies the administration center as well as 
the authentication center the previous RFID reader belongs. 

The proposed authentication center trust layer scheme considers abnormal event 
reports by RFID readers and affects the trust value of the domain authentication 
center the readers are part of. Calculation of trust in this case can be performed by 
either of the two methods proposed for the reader trust management schemes. 

Mahmud, 2018 [30]. This model considers three social trust metrics for a pair 
of nodes, namely: relative frequency of interaction, intimacy and honesty, and the 
deviations of generated data from the historical data of the node that generated the 
trust metric and its neighbors. Two trust dimensions are defined: node behavioral 
trust and data trust; both calculated by combination of direct (from the rating node) 
and indirect (from the rating node’s neighbors) interactions, with indirect interac- 
tions being weighted by the distance of the neighbor to the rated node. Node behav- 
ioral trust is calculated using an Adaptive Neuro-Fuzzy Inference System (ANFIS), 
a fuzzy system using back propagation to tune itself. The three inputs to ANFIS 
are defined as: relative frequency of interaction is defined as the ratio of interactions 
with the rating node out of all interactions of the rated node in a given time period, 
intimacy is defined as the ratio of time amount spent interacting with the rating 
node out of the total time spent interacting with all nodes except the rating node, 
and honesty is defined as the ratio of successful interactions out of the total num- 
ber of interactions of the rated node with its rating node. Three linguistic terms are 
used in ANFIS for each of the three inputs: Low, Medium and High. Deviations 
of generated data, used to calculate the data trust, are defined as follows: direct 
data trust is defined as the deviation of instantaneous data from the historical data 
generated by the rated node, and indirect data trust is defined as the deviation of 
instantaneous data from the historical data from the historical data generated by 
the rated node’s neighbors. 
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Arabsorkhi, 2016 [37]. The work of Arabsorkhi et al. presents the general prin- 
ciple behind many proposed trust management models considering ratings given to 
network nodes for the quality of the services provided over a specific time period. 
If the rating node has enough information to determine the trust value from its 
own ratings over the specified time period (by direct observation) it can proceed to 
calculate the trust value of the node to be rated. If not, then the rating node can 
query the rest of the network and aggregate the trust values assigned by the other 
network nodes to the rated node. 

Yuan, 2018 [29]. This model considers ratings given after node interaction for 
the quality of provided services. The network model consists of IoT edge nodes 
being part of a domain federated by an edge broker node, which in turn contact a 
central cloud server responsible for the final calculation of trust values. Three trust 
values are calculated: the direct trust about a device to another device (D2D direct 
trust), the feedback trust about a node by an edge broker (B-to-D feedback trust), 
and the overall trust (the final trust value) about a device. D-to-D direct trust is 
updated and based on the history of direct interaction between nodes, it is defined 
as the ratio of positive interactions and the number of total interactions between 
the two nodes. B-to-D feedback trust is updated by the edge broker periodically 
and is based on all the D-to-D direct trust values concerning an edge node (except 
self-ratings); the edge broker aggregates the D-to-D direct trust values using weights 
derived by use of object information entropy theory, overcoming the limitations of 
assigning the weights manually. The overall trust value is calculated as the weighted 
sum of the D-to-D direct trust and the B-to-D feedback trust, thus considering the 
opinion of the rating node as well as the opinion of the whole network about the 
rated node. 


8.4 Trust Management System 


The objective of the trust management system is to serve an authority within the 
J 8 y 

protected Internet of Things infrastructure perimeter, which undertakes the fol- 

lowing tasks: 


e Consolidates observations on the status, behaviour and associated risk of 
devices into a comprehensive trust score, which indicates the degree to which 
each device is deemed to be trustworthy. 

e Can be queried by other entities within the protected Internet of Things 
infrastructure perimeter, to provide the abovementioned assessments, for the 
perusal of the entities. Indicatively, trust assessments can be used for the visu- 
alization of trust within the network, for making decisions whether actions 
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Figure 8.1. SIEM platform elements providing information to the TMS. 


originating from or being directed to some device should be allowed or not, 
for raising alerts to security officers and so forth. 

e Provides timely notifications to other entities within the protected Internet 
of Things infrastructure perimeter, to alert them of noteworthy events related 
to the level of trust associated with devices. In particular, demotions of device 
trust level below some threshold and the restoration of previously demoted 
trust of devices are emitted, allowing relevant components of the protected 
Internet of Things infrastructure perimeter, to take appropriate actions, such 
as enabling or disabling defence mechanisms. 


8.41 TMS Context 


The TMS is envisioned to operate in the broad context of a platform following 
the Security Information and Event Management System (SIEM) principles [38], 
sourcing information required for its operation from other platform modules, as 
depicted in Figure 8.1. 

In more detail, the information sourced from other platform elements, which 
act as security information and event management (SIEM) providers is as follows: 


© platform users provide information regarding the peer users they trust, the peer 
TMSs that are trusted and explicit device trust specifications. Naturally, user 
interaction with the TMS is mediated through an appropriate application. 

© The CyberDefense module provides data regarding the network anomalies 
detected (deviations from the nominal device and network behaviour), the 
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non-compliant traffic (traffic flows that have not been whitelisted as “accept- 
able behaviour” for the device) and network attacks (primarily in the context 
of signature-based detection), either originating from some device or targeted 
against it. 

The ilRS (intelligent Intrusion Response System) module provides information 
regarding the devices that are in the scope of the TMS, their importance, the 
vulnerabilities existing on devices, events of device compromises, as well as 
network topology and reachability information. 

The eVDB (extended Vulnerability DataBase) module provides information on 
the detected vulnerabilities, including their impact, underpinning the assess- 
ment of the impact that vulnerabilities may have on the trust level of the 
affected device. 

The Device profile repository provides information on the cases that a device 
is removed from the system and when the device health is restored after a 
compromise (i.e. the malware is removed or “clean” versions of the operating 
system/firmware are installed). 

The TMS, acting as a trusted peer entity, provides trust assessments which are 
combined by the receiving TMS instance with the own device trust estima- 
tions, to synthesize a comprehensive trust score. 


The TMS, in turn, publishes information regarding changes in the trust level 


of the devices through the SIEM platform information bus (a pub/sub compo- 


nent that delivers specific types of information published to it to entities that have 


registered their interest in receiving these types of information), as depicted in Fig- 


ure 8.2. This information can be exploited as follows: 


e SIEM platform operator and end-user interfaces may use this information to 


generate alerts, especially in the cases of noteworthy trust demotion. 


Trust level changes. 


Information bus 


Figure 8.2. TMS outgoing information flows. 
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Figure 8.3. TMS high-level design. 


e Defence mechanisms, and in particular the iIRS can exploit this information 
to apply or disable restrictions in network traffic. 

e The Device repository updates its own database, guaranteeing information 
consistency and dissemination of the trust level to any other interested com- 
ponent. 

e Peer TMSs can use this information to update their trust assessments. 


8.4.2 TMS Application Architecture 


Figure 8.3 illustrates the conceptual view of the Trust Management System. Its 
architecture is designed to allow for exposing a coherent API, enabling any adapta- 
tion aspects to be implemented internally considering all the appropriate contexts 
(network & resource availability, situation criticality etc.). Reception of information 
needed to recompute the trust and risk scores — including device status, behaviour 
and associated risk aspects are mainly intercepted through asynchronous messaging, 
through a dedicated communication channel, following the pub/sub paradigm. In 
this way, the TMS is decoupled from event producers and their timings; however, 
content consumption via APIs can be also used. Reciprocally, the TMS publishes 
events regarding notable changes of trust and risk levels, while also offering the 
same information under REST APIs. Adaptation, where needed, will be supported 
by an adaptation component to be developed and maintained separately from the 
computational aspects, promoting separation of concerns. 
Figure 8.4 depicts the data view of the TMS, indicating: 


(a) the data maintained internally in the TMS database; 

(b) the messages that the TMS subscribes to in order to obtain the necessary 
information to compute trust and risk levels, as well as the sources of these 
messages, according to the overall SIEM system architecture; 
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Figure 8.4. TMS data view. 


(c) the information that the TMS receives directly from the users (typically, 
through a UI); 

(d) the messages that the TMS makes available to the asynchronous communi- 
cation infrastructure, for the perusal of other Cyber-Trust components. 


‘Trusted Peer TMS are curated directly by users. Users additionally provide infor- 
mation regarding other trusted entities in the platform: this pertains to modules 
that generate asynchronous messages to the information bus, and are expected to be 
consumed by the TMS. Each trusted entity specification provides the data needed 
by the TMS to verify the authenticity and integrity of received messages, i.e. the 
name of the peer and its certificate. While users are not commonly expected to be 
proficient with such data, automated procedures upon the setup of the platform 
are expected to relieve the user of the task of manually setting up this informa- 
tion. Should updates to this information be needed, automations, configuration 
assistants and wizards may also ease the task of the users. 


8.4.5 TMS Design 


In Figure 8.5, the entities involved in proposed trust model and the relationships 
between them are illustrated. The elements may appear in the context of the IoT, 
Smart Home, or SOHO environments and include: 


e Devices, which function within the considered environment. 

e Users, that own devices. A single user can have many devices. Users can estab- 
lish trust relationships between them, with these relationships having the fol- 
lowing properties (a) they are weighted, (b) they are directed, (c) they are not 
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Figure 8.5. The entities in the proposed trust model and their relationships. 


transitive and (d) they are not necessarily symmetrical. The following example 
illustrates these properties: 


o User uy states that s/he trusts another user u2. This is done by providing a 
trust level, which expresses u; $ confidence that u2 will not perform mali- 
cious actions against u; -or even take activities that have positive effects 
on u]. 

o The declaration of trust of uy towards u2 does not necessarily mean 
that u2 also trusts u}, expressing the fact that trust may not be recipro- 
cated [23]. It is still however possible that u2 makes a separate, indepen- 
dent assertion that s/he trusts u1; such an assertion may express a different 
trust level than the respective assertion made by u1. 

o Trust is not transitive: if u trusts u2 and u2 trusts u3, no assumption is 
made that w trusts u3. An explicit assertion by u; is required to establish 
any trust relationship to any other user in the domain of discourse. 


e Trust Management System instances (TMS): TMSs are effectively software agents 
which perform trust level computations towards devices within the considered 
environment. The trust value computation for a device is performed by consider- 
ing multiple factors which are either collected through monitoring the activities 
within the environment or explicitly provided. The factors taken into considera- 
tion are: 


o the device status: this includes (1) information about the integrity 
of the device, i.e. information attesting the legitimacy of the soft- 
ware/firmware/operating system and its configuration, as opposed to the 
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aforementioned components being compromised; and (2) information 
on the device's resilience, i.e. if the device’s software/firmware/operating 
system/configuration have any known vulnerabilities, as opposed to the 
case that no known vulnerabilities are present. 

the device behaviour: this encompasses the following information: 


1. if the device has been reported to perform attacks or has been identi- 
fied to be the target of attacks. 

2. if the device's resource utilization metrics comply with a predefined 
specification which defines what constitutes normal behaviour or if 
they diverge from it. Some examples of these metrics include, but are 
not limited to, network usage, CPU load, and disk activity. Practically, 
any class of system metrics that can be quantified, and for which base- 
line metrics can be created so as to allow computation of deviations 
from the baselines is eligible for incorporation within this dimension. 
Similar practices are widely employed in monitoring infrastructures, 
such as Nagios [39] and may include metrics such as number of con- 
nected users, amount of free disk, total number of processes, number 
of processes corresponding to some specific service instance, etc. 

3. If the device’s behaviour conforms to some predefined reference 
behaviour that is whitelisted as “normal”. MUD specification files [5] 
can provide such information, nevertheless they have not been widely 
adopted and manufacturer support is lacking. 


the risk associated with the device: IoT devices may become targets of 
attacks and some attacks may succeed. A probability indicating that a 
device will eventually be compromised can be calculated by considering 
technical information such as its vulnerabilities and its reachability inside 
the network. Attack graphs can be utilized to this end [36]. The level of 
impact of a successful attack on an organization/person owning a device is 
not always the same and can vary depending on the perceived value of the 
device. The perceived value of the device is directly linked with the assets 
it encompasses (e.g. the value of a device hosting a database is dependent 
on the value of the data in the database) or with the value/criticality pro- 
cesses it is responsible for (e.g., a vital signs monitor on a smart watch vs. 
a vital signs monitor used in remote surgery). 

Another aspect that must be considered when calculating the risk asso- 
ciated with a device d is the set of devices that are accessible through it, 
and whether is would be possible for attackers to use it as a bastion from 
where they assault other devices, attempting to compromise devices of 
high value in the context of more advanced, multi-staged attacks. In this 
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respect, the risk associated with d is dependent on (a) the probability that 
d is compromised itself, (b) the probability that devices reachable from d 
are compromised in the context of a multi-stage attack and (c) the per- 
ceived value of devices reachable from d. 

Taking the above into account, the associated risk dimension com- 
bines the above-mentioned aspects i.e. (i) the technical probability that 
the device is compromised with the perceived value of the device, and (ii) 
the probability that the device is used as a stepping stone to attack other 
devices, in conjunction with the business values of the assets associated to 
these devices, to synthesize a single, comprehensive metric expressing the 
business risk applicable to a device. 

o The trust relationship between the user that owns a device running a 
TMS instance and the user whose device is under trust evaluation. This 
aspect moderates the weight of trust level assessments, so that trust level 
assessments sourced from trusted TMSs (i.e. TMSs running on devices 
belonging to trusted users) are taken more strongly into account, while 
the importance of trust assessments sourced from non-trusted TMSs (i.e. 
TMSs running on devices belonging to users of unknown or low trust) is 
attenuated. 


An overall trust assessment is formed by the TMS instances by synthesizing the 
three trust dimensions: (i) status-based, (ii) behaviour-based, and (iii) associated 
risk-based trust. 

Furthermore, trust relationships can be established between TMS instances, in 
the same fashion that trust relationships are established between users. Similarly 
to the case of user-to-user trust relationships, TMS-to-TMS trust relationships are 
(a) weighted, (b) directed, (c) non-transitive and (d) not necessarily symmetrical. 
The trust relationships between TMS instances are explicitly provided by the users 
owning the devices on which TMS instances are run. Once a trust relationship 
stating that TMS instance T4 trusts TMS instance T3 is established, T will source 
trust assessments for devices from TMS T>, and take them into account when 
computing the respective devices’ trust levels. 

Finally, users are allowed to set explicitly the trust level of the devices they 
own, overriding the computations made by the TMS. This provision is accom- 
modated to handle false positives mainly related to network attacks (an attack is 
flagged by relevant modules but was not actually performed), network anomalies 
(e.g. excessive traffic was detected but this was due to a user-initiated backup or a 
software/firmware update) and compromises (e.g. some software on the device was 
misclassified as malware). The TMS will be able to provide both the automatically 
computed and the explicit trust level of the device, so that relevant applications 
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will be able to detect devices where major discrepancies exist and keep the users 
informed about such deviations, promoting awareness and facilitating intervention, 
as needed. 

According to the description listed above, the TMS composes the trust score in a 
hierarchical fashion, as depicted in Figure 8.6, undertaking a holistic view towards 
trust assessment. To perform this composition, the TMS necessitates different types 
of information for each device. The TMS operates in the broad context of c and 
sources the required information from other SIEM platform modules, as 
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Malicious activity- 
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Figure 8.6. Trust score composition dimensions and aspects. 


8.5 Conclusions 


In this chapter, we presented an approach to trust computation in the Internet 
of things, which synthesizes behavioral, device status and associated risk aspects 
into a comprehensive trust score, that can be consulted to realize trust-based access 
control. The proposed approach also considers device ownership relationships and 
owner-to-owner trust relationships, which are utilized in the trust computation 
process. 

Different parameters of the trust management computation process may be con- 
figured and tuned; notably, varying approaches may be used to compute the overall 
trust score based on the partial, dimension-specific scores; trust demotions may be 
subject to aging, i.e. their effects may decay over time, or may remain in effect 
until their root causes are known to be resolved; SIEM data may be associated with 
confidence levels, and these levels could be considered in the overall trust score com- 
putation. All these parameters are dependent on the particular context in which the 
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TMS operates. Our future work includes an in-depth study and analysis of these 
aspects; additionally the proposed TMS architecture will be evaluated, to quan- 
tify its overall performance, as well as its resilience against specific attacks that are 
launched against IoT networks. 
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The Internet of Things (IoT) environment is constantly changing, shaped by both 
technical and social needs. The rapid loT advancements and therefore the increase 
in the number of the interconnected data between services and infrastructure that 
potentially may pose threat into cyberspace, was the commencement of the Cyber- 
‘Trust project conceptualization [1]. Cyber-Trust conducts extensive research excel- 
lence in areas where IoT is widely applied. The structure of the project has been 
relied among others to taking into consideration stakeholders needs, so that the 
results that the project will produce are realistic based on final user needs. 
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In this context, an evaluation plan is designed, to assess platform’s operations. In 
this chapter is presented the validation, verification and evaluation methodology 
that Cyber-Trust followed during the first pilot phase of the project’s lifecycle. 
Cyber-Trust Evaluation Process contains information on how technical partners 
are going to validate technical components based on system's specifications and the 
appropriate methods in which end-users will evaluate all the functions of the plat- 
form. Validation, verification and evaluation goals are in line with project’s objec- 
tives. This chapter is also guided by the project deliverables related to (a) use case 
scenarios, (b) Cyber-Trust architecture, (c) end-user requirements, and d) the inte- 
gration of the overall system. 


9.1 Introduction 


Validation, verification, and evaluation are methods that exist under the same 
umbrella of the entire Evaluation Process. As the evaluation is the final stage in 
which the total “product” is assessed by actual or potential users, we refer to the 
whole process by this name. In many cases each of these methods are embedded 
in each other, but in different stages. From now and on, for the sake of simplic- 
ity, when we want to indicate the overall assessment procedure, we will refer to 
the Evaluation Process which includes the three aforementioned methods as three 
different stages contained in it. 

In a whole, the validation methodology assesses whether the product con- 
structed based the criteria (requirements) given by end-users answering to the 
question “Does this developed system do what is intended?”, the verification 
whether the system executes specific functions based on the system’s specifications 
answering to the question “Did we build the right product?”, and the evalua- 
tion is referred whether the developed platform as a total has met their desired 
needs. 

Validation, verification and evaluation methods have been formulated and 
implemented by various companies, enterprises as well as projects. Many multi- 
level frameworks have been developed to assess different products, including both 
objects and methodologies. Their scope among others is to ensure quality, enhance 
performance of the product and based on the acquired results (if the evaluation is 
continuous) to define the next steps. 

The state of the art of evaluation process frameworks have been identified below, 
proving that the framework utilized for building Cyber-Trust assessment method- 
ology is an extensible and customizable methodology. 
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9.2 State of Knowledge 


9.2.1 General Evaluation Process 


In this subsection is introduced a general evaluation process upon which Cyber- 
Trusts methodology based on. This frame is broadly used in order to evaluate the 
final product and is consisted by specific step-by-step procedures. 


1. Beginning with setting the frame (e.g., context, objectives, use cases, require- 
ments etc.) 

Design the system. 

Defining the evaluation groups, evaluation objectives, evaluation strategy etc. 
Setting up and executing pilot trials in order to evaluate the “product”. 


i aes Oe ob 


Evaluation results and assessment 


Based on the Step 5 the evaluation is considered as successful or not. For 
improvement purposes, when the first evaluation iteration is completed, Step 5 can 
provide feedback on Step 3 that continues the process until the end of the second 
iteration phase and goes on. 

Almost the same steps are used in Section 9.2.2 where the assessment took place 
in different type of “products”. Thus, the conclusion drawn is that the evaluation 
methodology is used regardless of the type of the evaluation object. The Cyber- 
Trust Evaluation Framework is explained in Section 9.3. 


9.2.2. Implemented Evaluation Framework 


Innovate Uk [2] is a national funding agency investing in science and research in the 
UK that has implemented an evaluation framework to objectively understand how 
a policy or other actions was enforced and what the consequences were. It evaluates 
their investment activities towards three (3) areas performing (a) process evalua- 
tion, (b) impact evaluation and (c) economic evaluation. The framework follows 
a circular flow that enables the evaluation of the first circle to have a total impact 
by giving feedback on second circle and modify the rationale of the new circle that 
will begin (second circle). 

The Evaluation Framework for National Cyber Security Strategies (NCSS) [3], 
targets to improve the cyber-security policy guidelines, by assessing the system and 
providing improvements to the defined strategy. It is consisted of 4 Phases, begin- 
ning with the initial one (a) developing the strategy, (b) executing the strategy, (c) 
evaluating the strategy, and end-up to (d) maintaining the strategy. For evaluation 
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purposes, a set of evaluation objectives has been set related to each evaluation 
phase. 

The National Institute of Standards and Technology (NIST) [4] has distributed 
the Cyber Security Framework (CSF) to develop a standardized approach to cyber 
security assessments for all sectors of the state’s critical infrastructure. The CSF can 
be tailored to a variety of technologies, life-cycle stages, enterprises. The stages in 
the general work process are (a) defining the scope and priorities (b) orientation 
(c) creating a current profile (d) risk assessment (e) creating of a target profile (f) 
identifying, evaluating, and prioritizing gaps, (g) implementing the action plan. 

PDCA (Plan-Do-Check-Act) [5] is an iterative, four-stage approach for con- 
tinually improving processes, products or services, and for resolving problems. It 
involves systematically testing possible solutions, assessing the results, and imple- 
menting the ones that have shown to work. The PDCA/PDSA framework is effec- 
tive in a wide range of organizations. It can be used to improve any process or 
product by dividing it into smaller steps or stages and working to improve each one. 


9.3 Evaluation Framework of Cyber-Trust 


Cyber-Trust from the beginning of the project sets the basis of the evaluation frame- 
work by introducing deliverables related to use case scenarios, end-user require- 
ments, platform’s architecture, and tools specifications which entailed core elements 
to feed evaluation process. However, the actual evaluation process began after the 
1% integration phase, reaching the point where a concrete platform has been cre- 
ated, and can be used as a pilot during the evaluation phase. 

Before the evaluation through pilot starts, the evaluation material synthesized 
and distributed to the end-users. The evaluation elements will be analysed below 
in Section 9.3.5. Also, the 7 steps described in the Figure 9.1 are analysed inside 
the chapter. 


9.3.1 Context 


The context of evaluating Cyber-Trust constructed to reach two (2) goals. The 
former is to reassure users about the platform's features and offerings, and the latter 
to quantify the solution’s impact to establish it on the end user community (see 
Section 9.4). The consortium should first validate if the developed solution meets 
the end user acceptance criteria, reaching the proper thresholds for each component 
(e.g., cyber-attacks detection rate both at device and network level etc.). 

Cyber Trust aims to advance environments with a secure setting in which Euro- 
pean citizens feel guarded, have a sense of autonomy, and feel secure in the context 
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Figure 9.1. Cyber-Trust Evaluation process inside the structure of the project (Image used 
from D8.5). 
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of digital framework security. Therefore, the project not only aims to strengthen 
the current state-of-the-art in a variety of cyber security domains. Cyber-Trust will 
use advanced cyber-threat intelligence operations, identification, and mitigation 
mechanisms to resolve the challenges of securing the environment of IoT devices. 


9.3.2 Objectives 


A number of strategic objectives were established to ensure a successful pilot imple- 
mentation, testing, and evaluation process. 

Starting with the implementation of the first Proof of Concept (PoC) of pilot 
test, accompanied by the analysis of the gathered results, an operational system that 
provides all the expected services was developed. Continuing with pilot testing pro- 
cess, which is an essential part of the priorities. The platform will be thoroughly 
tested for achieving specific goals, such as detecting specific cyber-attacks at the 
device and network level (e.g., zero-day vulnerability), monitoring and develop- 
ing a framework for efficient continuous vulnerability assessment and remediation, 
improving IoT network resistance to specific types of attacks (e.g., DDoS), and 
finally, providing advanced threat intelligence. 
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Also, security, reliability, efficiency, interoperability, and scalability are all critical 
evaluation goals that contribute to a successful evaluation and thus pilot testing 
process. As a result, the cyber-security platform will have advanced far beyond the 
current state of cyber security, ushering in a new era for the next generation of 
cyber-security architectures. 


9.3.3 Assessor Teams 


The end-user groups involved in the evaluation process were Smart Home Owners 
(SHO), Internet Service providers (ISPs), and Law Enforcement Agencies (LEAs). 
The three (3) groups evaluated the Cyber-Trust platform via three (3) different cus- 
tomized User Interfaces (UIs). There is an additional UI dedicated to ICT Admin- 
istrators for ISPs users too. Each stakeholder group will access different compo- 
nents functionalities, as each UI was designed solely to meet the daily needs of 
stakeholders, as depicted in Table 9.1. 


9.3.3.1 End-Users high level needs 


Table 9.1. Main purposes behind the demands of the stakeholders. 


End-Users Targets 


Smart Home e Safeguarding Smart Home Devices and Infrastructure 


Owners (SHOs ae : 
( ) o Monitoring smart homes assets health status, risks lev- 


els. 

o Detecting abnormal traffic behavior and notifying for 
minor or critical vulnerabilities or possible attacks. 

o Alerting SHO for cyber-attacks at device and network 
level. 

o Updating devices, infrastructure security settings. 


Internet Service e Safeguarding Customers 


Providers (ISPs Lo. A . 
(ISES) o Monitoring customers network infrastructure 


o Providing crucial information to LEAs when it is 
requested by their customers. 


Administrators e High-level orchestration of ISP UI account. 

(Admins) 

Law Enforcement e Improving Chain of Custody 

Agencies (LEAs) e Reduce the time needed to exchange information, which 


might contain forensic evidence, regarding cyber-attacks 
between LEAs and Internet Service Providers 
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Figure 9.2. End-user extraction methodology. 


9.3.5.2 End-User requirements methodology 


The extraction of the end-user requirements came from both the research analysis 
and the aggregation of user demands. Four sources were used to determine the 
platform’s requirements. The actions taken toward these sources are: 


e The analysis of existing industry solutions and research activities-domain 
knowledge 

e The analysis of Cyber-Trust use cases 

e Conduction of dedicated workshops with the end-user groups 

e Creation of targeted Questionnaires (5 Questionnaires in total) 


The methodology is outlined in the Figure 9.2. 
The end-user requirements were divided into functional and non-functional cat- 
egories based on the content of each requirement, and then prioritised using the 


MoSCoW methodology. 


9.3.3.5 Cyber-Trust components 


Cyber-Trust contains a variety of components designed to achieve the scope of the 
project. The roles and the responsibilities of the components initially described 
through the architecture documentation and then redefined to technical deliver- 
ables, tailored to the architectural and operational needs of the tools. 

Some of the components of the Cyber-Trust, presented in Figure 9.3, are used 
in the backend system and are not available to the users. Thus, these components 
are not evaluated by the stakeholders. The only components that are assessed are 
those with graphical user interface. 
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Cyber-Trust Platform Components 


x Crawling Service 
Profiling Service 
Smart Device Module 
Registration Module 
Trust Management Service 
Intelligent Intrusion Response 
Smart Gateway Module 
* Distributed Ledger Service (Blockchain) 
Network Repository 


Cyber-Defense Service 


Figure 9.3. Components of Cyber-Trust (Image used from the Cyber-Trust dissemination 
video). 


Table 9.2. Capabilities distribution among components. 


C-T Components & Storage 
Services Detection Protection Mitigation and Sharing 
Crawling Service x 

Profiling Service x 
Smart Device Module x x 

Registration Module x 
Trust Management x x x 

Service 

Intelligent intrusion x x 

response 

Smart Gateway module x x 

Distributed Ledger x 
Technology 

Network repository x 

Cyber-Defense service x x 


In Table 9.2 the components are classified based on their capabilities. The fol- 
lowing is a descriptive analysis of the tools based on their capabilities: 


e Crawling Service is responsible for detecting web pages and security-related 
websites regarding cyber-threat intelligent in order to identify emerging 
threats, exploitation kits and zero-day vulnerabilities. 
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e Profiling Service’s stores centrally information and profiles connected to 
Cyber-Trust devices and detects the correlation of devices’ existing informa- 
tion with newly acquired data from other secure repositories and sources. 

e Smart Device Module is running on the device and inform users for their 
device's health status (such as vulnerabilities detection, firmware updates, 
etc.). The users will be informed via alerting channels, such as mobile-app- 
messages. 

e Registration Module provides registration capabilities to various actors, such 
as users and organizations including Smart Home Owners (SHOs), Internet 
Service Providers (ISPs), Law Enforcement Agencies (LEAs). 

e Trust Management Service gathers the actions/behaviours and the vulnerabil- 
ities of the IoT devices and responds accordingly by increasing or decreasing 
trust. 

e Intelligent Intrusion Response running on a network gateway at the user 
premises providing continuous monitoring of the Smart Home's security sta- 
tus and the computation of possible mitigation actions to sophisticated cyber- 
attacks. 

e Smart Gateway Module is a component which is running on network gate- 
way and is using Machine Learning techniques in order to identify network 
anomalies. 

e Distributed Ledger Service (Blockchain) is basically related to integrity stor- 
age and enhanced sharing capabilities through the blockchain. Some princi- 
pal operations are storage of data related to forensic evidence, validation of 
the transactions, consensus, etc. 

e Network Repository is a set of tools that are used to collect, manage, and 
store information on a network’s architecture including the topology and the 
security defences. 

© Cyber-Defense Service deals with the cyber-attack’s detection and mitigation 
on networks 


9.3.4 Integration Phase 


Cyber-Trust entails two (2) integration phases within its lifecycle, at present, the 
first phase has been successfully achieved. Its importance stems from the fact that 
the Cyber-Trust components through this phase (a) became functional and (b) were 
interconnected as a unified system. Three consecutive tests were incorporated into 
a completed Integration methodology. These tests are (a) the Functional Testing, 
(b) a Stress Test Plan (including Load and Stress Test, Dimensioning of Resource 
Utilization) and (c) a Penetration Testing Plan. The system integration and overall 
functional testing were focused on workflows (Use Cases), and the aim was to 
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Figure 9.4. Cyber-Trust development and evaluation overall plan (Image used from D8.1). 


ensure that components’ messages are transmitted correctly and that the Cyber- 
‘Trust components communicated properly. Each workflow has been analysed with 
communication links among various components identified. The first version of 
the integrated platform is used in the first pilot phase. 


9.3.5 Pilot and Evaluation Process 


As shown in Figure 9.4, the evaluation process implemented in two (2) repeating 
development Cyber-Trust cycles or “sprints” in total. Cyber-Trust captured and 
implemented end-user requirements in the first sprint, then proceeded with system 
implementation before the first pilot phase. Prior the start of the second “spring” 
and during its duration comments collected during the first pilot apply. The goal 
of this structure is for end-users to receive the product that they want and benefit 
from using it. 


9.3.5.1 Pilot trials 


Pilot tests realized both synchronously an asynchronously. Synchronous tests were 
performed in real-time in pilot trials over a series of system evaluation sessions uti- 
lizing a dedicated six (6) hours slot. Asynchronous tests were happened remotely, 
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at evaluators’ pace, throughout the first (1% ) pilot period. Both testing methods 
enabled end-users to run the platform and gain experience from it while also pro- 
viding valuable feedback and comments for the second evaluation phase. During 
both testing methods, human rights, GDPR (679/2016) compliance and e-privacy 
regulations were applied to all pilot cases for all testing performed. 


9.3.5.1.1 Pilot scenarios 


As the platform executed synchronously and asynchronously, scripts were also cre- 
ated for both testing purposes. For the former type of procedure one (1) consoli- 
dated pilot scenario with numerous test cases were created. In that scenario all the 
evaluators were able to participate. During the live trials real attack scenarios were 
made, making the end users familiar with dealing with cyber-attacks. For the lat- 
ter type of procedure four (4) user-oriented pilot scenarios with multiple test cases 
were constructed and distributed to the end-users giving them the opportunity to 
execute all the test cases before or after the live tests. Through those scenarios the 
user was able to retrospect features and rules that were implemented to the platform 
(visualised through the UI). 


9.3.5.2 Functionality verification 


Cyber-Trust has created a Functionality Verification plan which includes all the 
appropriate actions that verify the project’s functions, as it was specified by the 
end-user groups of the project. Functional, and non-functional requirements were 
included in the Functionality List. During the pilot scenarios mentioned in Sec- 
tion 9.3.5.1.1 the Functionality List was able to be revised and completed with 
the verification status (Achieved, Not Achieved, Partially and Modified). Since, the 
end-user requirements were converted to system specifications by the initial year of 
the project, the end-user requirements verification status provides an answer to the 
question “Does this developed system do what is intended?” [6]. 


9.3.5.3 Components validation (KPIs) 


Key Performance Indicators (KPIs) validated the system and the components based 
on numerical metrics. The technical partners and the end-users enabled to validate 
the platform and validating the platforms components and pilot oriented KPIs. 
In a more simplified point of view, validation is the procedure enabling to answer 
the question “Did we build the right product? [7]”. The KPIs of the Cyber-Trust 
product is constantly measured during the pilot and integration phases. Recently 
and compared to the last integration phase the measurements have shown that the 
KPI values are increasing sharply (with a minority of constant values), indicating 
that the quality of the product performance continues to rise. 
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9.3.5.4 Usability questionnaire 


A single main questionnaire was developed in the sense of Cyber-Trust. The ques- 
tionnaire contained closed-ended Likert scale questions, and the layout is focused 
on two key areas: platform satisfaction and efficiency and effectiveness of platform 
operations. In both questionnaire zones, the questionnaire framework and ques- 
tions were tailored to each of the target end-user audiences. 

The Cyber Trust questionnaire is based on the System Usability Scale [8] (SUS) 
and Technology Acceptance Model [9] (TAM) methodologies. SUS is a reliable 
tool for calculating usability. The answers are consisted by five and three Likert 
scale options for each respondent ranging from strongly agree to strongly disagree. 
In TAM, two major factors influence a user’s decision about how and when to use 
the technology. These two factors are (a) perceived usefulness and (b) perceived 
ease of use. The decision of an end user to use a designed approach is influenced by 
the individual’s personality toward using a particular method. A person’s attitude 
toward using a tool is influenced by its perceived utility and ease of use. The two 
methodologies mentioned above display in Cyber Trust the Measured Perceived 
Ease of Use and perceived usefulness to provide a consistent and coherent analysis. 


9.4 Evaluation Impact 


Aside from the efficacy and performance criteria, the accessibility of web-based sys- 
tems has recently become more important due to user satisfaction — being one of 
the powerful determinants. The academic literature has investigated usability issues 
of web-based platforms. Prior studies have offered valuable insights into the per- 
formance of web-based platforms. A systematic analysis is necessary to analyze the 
work performed in a cybersecurity environment, compare the gathered findings, 
identify the targeted topics and challenges that remain unresolved, and discuss 
future research topics that may be pursued. 

In addition to the above, impact assessment is frequently used to determine 
whether a platform has been fully incorporated. It is also be used to address product 
design issues, such as determining which solution among the alternatives a platform 
considers to be the most promising. The second pilot phase was completed by the 
Cyber-Trust end-user groups (Table 9.3), and the questionnaire results analyzing 
the impact in the end user community are given below. 

The diagram above depicts the distribution of end users. The figure also illus- 
trates the normal distribution. What is given is fair considering the distribution of 
Cyber Trust end users and available network interfaces (Figure 9.5). 

The objective of the Cyber Trust consortium is to see if the responses were con- 
sistent and trustworthy. Cronbach’s alpha (or coefficient alpha) was devised by Lee 
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Table 9.3. End users distribution statistics. 


End_users_distribution 


Frequency Percent Valid Percent Cumulative Percent 


Valid LEAs 5 15,2 15,2 15,2 
ISPs 8 24,2 24,2 39,4 
ISPs in 3D workshop 4 12,1 12,1 51,5 
ADMINs 3 9,1 9,1 60,6 
SOHOs 13 39,4 39,4 100,0 
Total 33 100,0 100,0 


Histogram 


10,0 


Frequency 
N 
o 


End_users_distribution 


Figure 9.5. End users distribution graph. 
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Figure 9.6. Cronbach’s alpha interpretation. 
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Figure 9.7. Cronbach’s alpha results as indicated by the four different End user groups. 
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Figure 9.8. The percentage of evaluators answered to “I found easy to learn how to 
navigate within the Cyber-Trust platform” question. 
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Figure 9.9. The percentage of users answered to “l felt very confident in completing all 


of my work using the Cyber-Trust platform” question. 


Question_3.14 


Percent 


Question_3.14 


Figure 9.10. | managed to access and retrieve all information needed. 


Cronbach to assess the consistency of multiple-question Likert scale surveys. The 
total consistency rating of a measure is determined by the coefficient of reliability, 
which ranges from 0 to 1. With an average internal consistency of 0.968, end users 


rated 96.8% reliability with the CT platform. 


The majority of replies to the evaluation questionnaire indicate that the plat- 
form’s user friendliness is rapidly improving. With a user-friendliness score of 62%, 
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Question_4.3 
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40° 


Strongly Agree 
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Figure 9.11. The percentage of users answered to “In the 2D-UI: By the time | logged into 


the system, | used at least 3 clicks rule for accessing information related to cyber-attacks” 
question. 


Question_4.17 


Percent 


Agree 
Question_4.17 


Figure 9.12. The percentage of users answered to “I would imagine that most end-users 


would agree that Cyber-Trust is necessary to safeguard their loT devices against mali- 
cious cyber-attacks” question. 


the CT appears to be a user-friendly platform. Furthermore, in terms of naviga- 
tion and time-consuming issues, end users find it convenient and in line with their 
requirements. The CT looks to be an adaptable platform to varied end user needs, 
with an average score of 60%. “I felt quite confident in completing all of my work 


Evaluation Impact 177 


Question_8 


Percent 


Question_8 


Figure 9.13. | did not experience any disorder (e.g. sickness) during the 3D interaction. 


Question_4.44 


Percent 


40° 


Always 


Question_4.44 


Figure 9.14. | managed simply to navigate and view all the information of a specific 
CVE ID. 


utilizing the Cyber-Trust platform,” said 67 percent of those polled in response to 
the question. These data demonstrate how relevant end users rate the “easy of use” 
of the Cyber Trust platform. Finally, 58 percent indicated they were able to access 
and obtain all of the information they required, indicating the platform's perceived 
utility. 

In terms of platform efficacy and efficiency, 100 percent of the End user audi- 
ence believes the Cyber-Trust user interface (UI) follows the three-click rule for 
acquiring information concerning cyber-attacks. Furthermore, 46% of Cyber-Trust 
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community end users strongly agree that Cyber-Trust is vital to protect their loT 
devices from malicious cyber-attacks. Finally, when it refers to the Cyber Trust’s 
3D component, 100 percent of users said they did not feel any disorder (such as 
illness) throughout the 3D interaction as well as to simply navigate and view all the 
information of a specific CVE ID. 


9.5 Conclusions 


In a nutshell, collecting and analyzing data from pilot activities reveals the satisfac- 
tion rate of the stakeholders and the level of system’s performance. More specifically, 
the intercorrelation of the project’s tasks (containing use cases, user requirements, 
state of the art deliverables and the description of tools) from the beginning, enabled 
Cyber-Trust to record the needs of the stakeholders as well as the areas of applica- 
tion of the platform. The design of the evaluation methodology created based on 
known standards (SUS, TAM), was adapted to the scope of the project, and the 
evaluation material was designed to assess the technological advancements of the 
Cyber-Trust. Also, comments during the pilot phase eventually led to the drastic 
modification or enhancement of an evaluation element. Consequently, the Cyber- 
Trust Evaluation Process is vital not only for gathering information and evaluating 
pilots but also for providing feedback on what features in the graphical user inter- 
faces (GUIs) and procedures need to be improved. The results obtained through 
the Cyber Trust platform will lead to the advancement of revolutionary emerg- 
ing solutions that improve commercial visibility and feasibility of a high techni- 
cal readiness level product that offers a comprehensive solution to cyber security 
issues. 
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We present a testbed, which hosts and interconnects ten (10) simulated, 750 emu- 
lated and one cyber-physical Smart Home (SoHos). The SoHos are digitally organ- 
ised in three testbeds. Our chapter is structured in five Sections. 

This chapter provides information regarding the design, architecture and imple- 
mentation of these large number of SoHos, deployed for running multiple cyber 
attacks (more than 20 different attacks) for testing and validating the capabilities of 
the Cyber-Trust platform developed during the European Commission co-funded 
research and innovation Horizon 2020 Cyber-Trust project [1]. 

In Section 10.1, the significance of these testbeds, both from a marketing and 
exploitation perspective for different types of organisations; these shall benefit 
from the exploitation of the results and relevant information, arising from the use 
and utilisation of the aforementioned platform. In Section 10.2, we present the 
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requirements, both technical and non-technical as well as the interconnectivity of 
different, heterogeneous technologies present. In the Section 10.3, the main results 
are presented and some discussion on them follows. Section 10.4 is dedicated to the 
exploitation of the results, their impact on potential business and possible exten- 
sions. 


10.1 Introduction 


Nowadays, the massive production of affordable, easy-to-access and easy-to-use 
smart devices, in combination with the increasing and improving telecommunica- 
tions network coverage has led to the advent of the so-called SoHos. Moreover, the 
extreme complexity, associated with the fact that data, coexisting networks (often 
multiple types of networks), pass from multiple networks, which reside at the same 
place, the coexistence of different protocols, such as 4G, 5G, Wi-Fi, etc. as well as 
the need for continuous machine-to-machine communication and the associated 
protocols (e.g. Bluetooth) are indicative of the level of complexity present. To this 
end, security and privacy issues, arising as a result of the presence of different pro- 
tocols and the fact that the same data travel via different protocols and are at the 
same time exposed to the internet lead to a further increase of complexity. 

The popularity of SoHos and their adoption from an increasing number of peo- 
ple all around the globe is increasing more and more, both in non-commercial as 
well as in commercial environments. Evidently, there are entities, such as organi- 
sations, companies and bodies, belonging to the latter category, which can greatly 
benefit from the results produced, the conclusions drawn and lessons learnt, after 
conducting research on SoHos. These entities include, but are by no means limited 
to the following main categories: 


Information and Communication Technologies (ICT) 


Research organisations, carrying out and/or interested in pilot testing 


Security organisations/companies 


Any technology organisations/companies with a focus on or active in the field 
of SoHo technologies, services and/or the associated smart devices 


Therefore, having actively contributed to the field of security technologies, 
testbed set-up, in general; and after conducting research in the field under con- 
sideration through KEMEA’ active participation in the testbeds and execution of 
pilots of the project “Cyber-Trust (CT)”, OTE’s/Cosmote’s contribution in testbed, 
and CGI’s contribution in exploitation and market uptake, we are in a very good 
position to present and share valuable results and specifications, based on the suc- 
cessful experiments carried out within the context of “Cyber-Trust” [1]. These 
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exploitable results of “Cyber-Trust” mainly fall within the scope of potential busi- 
ness and exploitation-wise solutions, strategies and associated business, financial, 
technological and research areas. 

Now, we proceed with the specifications, both technical and non-technical, 
entailed in the process of the testbeds set-up, their interconnectivity, the different- 
heterogeneous technologies present as well as some noteworthy tools, requirements 
and details. 


10.2 Cyber-Trust Testbed Specifications 


First of all, the testbeds have been set up with the aid of different, intrinsically 
heterogeneous virtualization technologies, be they: 


e Microsoft Hyper-V: which is installed on KEMEA’s premises and has been 
used to set up KEMEA'’ testbed and the associated Virtual Machines (VMs); 
this is not cloud-based. 

e OpenStack: which constitutes open-source cloud software; it is installed on 
OTE’s premises and has been used to set OTE’s SoHo VMs 

e Variety of Operating Systems used for the cyber-physical 


The testbed did not only include the SoHos but also the Cyber-Trust platform, 
Command and Control Server for the Mirai, Black Energy, ZEUS and ZitMo 
attacks. 

So, the individual technologies mentioned above needed to be: 


e interconnected 

e made to work continuously, in real time (or at least continually some- 
times), and 

e synchronised and capable of interacting with the user(s) as realistically as pos- 
sible, so as to be able to imitate the real-world functionalities and character- 
istics of smart homes and/or smart home networks 


Undeniably, this testbed, a graphical representation of which is shown in 
Figure 10.1, is truly complex from an infrastructure point of view as well as from a 
connectivity and functionality one. Nevertheless, not only is the high level of com- 
plexity justified, but it is necessary, as well. The significance of its complexity lies in 
the fact that the real-world system is really complex and it involves a wide variety 
of different coexistent technologies, so the underlying complexity in the testbed is 
deemed as necessary, should the simulation be as realistic as possible. Therefore, 
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considering the different technologies present in the real-world situation (such as 
Bluetooth, 4G/5G, Wifi ones, infrared, let alone the different architectures, versions 
and implementations of them), the resources needed, and the cost of a real-world 
testbed, the large amount of time spent for setting our complex testbed up and the 


difficulty involved can be justified. 


ES _ 
openstack Sait aes ——— ; 1 M 
i e coshore Ran kaii 
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4 Emuloted SOHOs@ KEMEA 


750 Simulated SOHOs @ OTE Á 


T ADDITESS 


6 Emulated SOHOs @ OTE 


1 cyber-physical SHO 


Figure 10.1. Graphical representation of cyber-trust testbed. 


The structure, components and basic characteristics and resources of each sim- 
ulated SoHo of our testbed are shown in Figure 10.2, below: 


VRAM | VHDD 
“SOHO | eran || Tey | ES 
“soHO1 | 1 T 1 1 1 1 1 1 1 1 1 20 | 66 | 352 
soHo2 | 1 T 1 E 1 7 i 1 1 i 18 | 58 | 320 
soHo3 | 1 T a I I a 1 1 1 T 18 | 58 | 320 
soHoa | 1 T 1 1 a T A 1 T 1 18 | 58 | 320 
sonos | 1 a i A 1 T T 1 T 1 70 | 64 | 336 
soHoe | 1 T A 1 1 1 1 T 1 1 is | 56 | 304 
soHo7 | 1 T a T 1 T a 1 i 1 24 | 70 | 390 
soHos | 1 1 1 1 1 1 A 1 1 1 22 | 72 | 416 
SOHO9 | 1 1 a 1 1 1 1 1 1 1 2a | 72 | 406 
SOHO 10 1 1 1 1 al al H 1 1 a 22 74 432 
TOTAL resources allocated for the Smart Homes| 202 652 3708 


Figure 10.2. Virtual machine components of a typical testbed. 


10.3 Interconnectivity via an ad-hoc Routering Process 


The interconnectivity among the heterogeneous networks has been achieved by 
means of a customised routering process, with user-defined IP routing tables, sim- 
ulating the exposure of IP addresses from the provider as well as the routing pro- 
cess within the domestic, business, industrial networks, which host the SoHos. 
The router emulator is an Ubuntu VM which implements the routing process. 
Together with the traffic generator, which is another Ubuntu VM, responsible for 
the network traffic generation. Furthermore, to ensure the connections are estab- 
lished securely, the necessary, dedicated certificates have been issued and installed 
into each SoHo; the open-source software OpenVPN [2] has been used for the 
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establishment of secure connections via the Secure Sockets Layer (SSL) and the 
standard ssh service in Ubuntu, too. In the case of different Operating Systems, 
OpenSSH tools have been used to the same purpose. 


10.3.1 Cyber-Trust SoHo Components 


Therefore, a graphical representation of our deployed SoHo together with its con- 
nections and interactivity with the Internet Service Provider (ISP) as well as any 
external or internal networks (e.g. WAN, LAN, etc.) is presented in Figure 10.3 
below. 


SOHO LAN 
192,168.Z.N/26 


ISP WAN 
172.16.XY 


DHCP 
aNe6 ~ Ne62 


ISP Provider 


Figure 10.3. Deployed smart home (SOHO) ecosystem. 


10.4 Tools Used & Utilised - Methodologies Adopted 


Given the testbeds under consideration lie in different infrastructures, conver- 
sions of virtual hard disks to the formats of interest, cross compilation and build- 
ing tasks play a significant role in enforcing and maintaining compatibility across 
the different environments. Additionally, the establishment of a continual testing/ 
verification procedure, ensuring the viability of the interacting testbeds has been of 
paramount importance. 

Regarding the conversion among different virtual disk formats, such as VDI 
(Oracle Virtualbox, openstack), VMDK (Oracle Virtualbox, VMWare products, 
QEMU, Parallels Desktop for Mac, openstack), VHD (Hyper-V, Oracle Vir- 
tualbox, openstack), VHDX (Hyper-V, openstack), the image file format of 
Parallels version2 HDD (Oracle Virtualbox), qcow2 (openstack, QEMU), raw 
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(just to mention a few widely used ones), open-source tools have been used. These 
tools include qemu-img [3], the VBoxManage command-line tool as well as Star- 
Wind V2V Converter. 

Moreover, several types of experiments have been carried out, including cyber- 
attacks, the logging of the attacks, and severity alerts have been generated and 
graphically presented through a Graphical User Interface (GUI), built and set up 
in terms of the same Project to bridge the Human-Computer Interaction (HCI); 
i.e. the platform interface. 


10.5 Results & Discussion 


Now, we present the generated results. The testbeds functionality has been proved 
to be excellent. More specifically, the phases of Cyber-Trust assessment, evaluation, 
integration, testing, pilot execution and results analysis together with the corre- 
sponding results (i.e. functionality verification results, evaluation results, perfor- 
mance measurements results etc.) have been made available to date. These include: 


(i) System integration and overall functional testing results: the Cyber-Trust 


Platform is based on event—driven, loosely coupled service-oriented archi- 
tecture that implements a publish/subscribe approach, supported by direct 
component communication via RESTful interfaces. 

(ii) Performance Testing Results: Load and stress testing have been conducted 


and appropriate Key Performance Indicators (KPIs) have been defined 
and evaluated. The tests include regression testing, connectivity and acces- 
sibility of services testing, load and stress testing, etc. 

(iii) End-user evaluation results: The evaluation process covers the different 


methods used to assess the evaluation material. It also presents the eval- 
uation material (e.g., manuals, questionnaires, test case scenarios, require- 
ments, KPIs etc. In more details, synchronous and asynchronous types of 
tests were used to evaluate the platform as a whole and its services. Syn- 
chronous tests were carried out concurrently in a series of system evaluation 
sessions using a dedicated three-hour (3) slot, with the involvement of vari- 
ous stakeholders. Asynchronous tests were performed at the pace of evalua- 
tors, remotely. A training demo towards the end-users has been carried out 
as well. In both testing methods, functional requirements were verified by 
the related end-user group and the non-functional requirements were veri- 
fied both from technical partners and evaluators. Moreover, a usability test 
examining efficiency, effectiveness, satisfaction, ease of use and usefulness 
was shared. 
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(iv) Penetration testing and results (to be extracted/published): These will 


determine the minimum level of security; 


password, open ports, reverse proxy, etc; 
e encompass best practices mentioned, based on: OWASP, ASVE, 


1SO27001 


include penetration testing at application level; 


be associated with session management, authentication, access control; 
take into account password complexity, user management, edit/recover 


e incorporate code review of components utilising automated means. 


All the aforementioned results extend far beyond the present context; for 


instance, to scientific areas including e-privacy, GDPR, ethics, etc. An overview 
of the Evaluation and Assessment procedure is presented in Figure 10.4 below. 
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Figure 10.4. Evaluation and assessment procedure. 


10.6 Exploitation of Results & Impact on Business 


The aforementioned results can be greatly exploited first of all by the consortium 


members of the Cyber-Trust project, by the organisations, companies and author- 


ities, engaging in the following fields or similar [4]: 


e ICT 
e Research with piloting 
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(Information) Security 
Cyber Crime 
Smart home solutions/devices/electronics/equipment 


Smart devices 


Additionally, the exploitation of the results extends far beyond the categories 
mentioned above. More specifically, setting out from the results collected, processed 
and post-processed after carrying out the cyber-attack simulations [5, 6] and the 
associated pilots as well as the related [7, 8], the interested entities can take full 
advantage of them within the following contexts: 


1. Simulation of cyber-attack and prediction of impact on business. Multiple 
scenario-based subsequent simulations with and without (possibly) affected 
components and evaluation of impact on business together with disaster 
recovery scenario and optimisation (optimal scenario/scenario adoption. 
Dynamic optimisation possible. 

2. A step closer to (co-)simulation-in-the-loop side by side with the real-world 
business activities. 

3. Testing and hardening of processes, components, upgrades of self-defence 
and cybersecurity components, improvement of failover strategies. 

4. Improvement of existing smart home devices, equipment, software 

5. New smart-home devices, equipment, software 

6. Improvement of interconnectivity among (technologically) heterogeneous 
smart homes and smart home devices 

7. Develop strategies for bridging and tackling different, currently incompatible 
smart components/and or devices, including but not limited to those bearing 
agnostic components and/or closed-source code. 


10.7 Conclusion 


In the present article, we have presented the results from our simulated, tested SoHo 
platform, their exploitation potential in several fields, mainly from a business per- 
spective as well as their impact on business and extensions. We have also analysed 
the challenges faced, as far as their complexity is concerned, both in terms of inter- 
connectivity of inherently different, though compulsorily interacting and cooper- 
ating technologies and protocols. To this end, we have also presented our success- 
fully adopted methodology, custom routing process to ensure interconnectivity as 
well as smooth, uninterrupted cooperation among components. Last, but not least, 
we have discussed the broad applicability of our implementations and justified the 
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need for setting up such complex, heterogeneous and resource demanding testbeds 
towards the realisation of a realistic, nearly real-world simulation environment. 
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Securing Today’s Complex 
Digital Realities 


By A. Rajkumari* and C. Wallace’ 
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Today’s organizations require agility and innovation to deliver seamless digital 
experiences—anytime, anywhere. In response, customer, employee and supplier 
ecosystems have become more complex, connected and open. At the same time, 
cyber risks and threats are growing in velocity and complexity. 

To address these challenges, enterprises need a balanced and proactive cyberse- 
curity approach. This includes managing human and non-human digital identi- 
ties and access, protecting both information and operational technologies, secur- 
ing multi-cloud environments, safeguarding automation and artificial intelligence 
workloads, and complying with increasing regulations. 
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Our cybersecurity approach for today’s modern work environments has been 
tested and proven. We bring accelerators in the form of maturity models, reference 
architectures, technical know-how, cross-domain expertise, risk management meth- 
ods, and client lessons learned to accelerate and empower your business. With CGI’s 
Cybersecurity Advisory Services and Accelerators, you can increase agility and inno- 
vation while ensuring holistic management of cyber risks. 

In this chapter, we cover the new digital reality and what it means for cyberse- 
curity, and how CGI is helping its clients secure their connected operations. 


11.1 Today’s Digital Reality and What it means for 
Cybersecurity 


Enterprises are continuously evolving to deliver value to customers, citizens, 
employees and shareholders at pace in response to fast-changing needs. 

New technologies, data sources and connections are enabling this evolution, 
including multi-cloud environments, edge computing, automation, artificial intel- 
ligence (AI), Internet of Things, 5G, micro-services, devices, and application pro- 
gramming interfaces (APIs). However, cyber threat actors are harnessing these same 
advances to create an increasingly sophisticated and dynamic risk landscape. The 
cybersecurity arms race is escalating. 

Enterprises also are expanding their supplier ecosystems and customer bases. 
Many are involved in mergers, acquisitions, divestitures and reorganizations, and 
have increasingly hybrid workforces (human and non-human) operating from 
almost anywhere. 


GOOD TO KNOW 


Growing importance of cybersecurity [1]: 


e Cybersecurity is the most frequently mentioned business priority 

© 64% say securing cloud platforms is a key cybersecurity priority for their 
organization 

e 25% say they do not know whether they have mechanisms in place to 
locate where key data assets are processed and stored 


Preventable identity-related breaches [2]: 


© 79% of organizations have experienced an identity-related security breach 
in the last two years, and ... 


© 99% believe their identity-related breaches were preventable. 
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The future is a hybrid world [3]: 


e By 2025, there will be 55.7 billion connected devices worldwide, 75% of 
which will be connected to an IoT platform. 

e By 2023, 75% of the G2000 commit to providing technical parity to a 
workforce that is hybrid by design, rather than by circumstance, enabling 
them to work together separately and in real-time. 


Mergers and acquisitions are increasing [4]: 


e Since 2000, more than 790,000 M&A transactions have been announced 
worldwide with a known value of over US$ 57 trillion. 


11.2 Protecting the Business Without Inhibiting 
Innovation and Pace 


In this digital reality, executives have top priorities: 


e Enable innovation and collaboration at pace: Today’s organizations extend 
beyond traditional enterprise boundaries to external ecosystems—so does 
security. A modern approach across the continuum of security operations 
enables the safe creation, operation and evolution of flexible, efficient and 
collaborative ecosystems, and ensures seamless experiences. 

e Reduce risk exposure and effectively manage risk: An insights-led 
approach to risk management uses rich data to identify and manage risks 
holistically across the enterprise in near real time, allowing for proactive 
and comprehensive risk mitigation and fast response to threats. It includes 
managing human and non-human digital identities and their secure access, 
advanced threat monitoring and response and so on. 

¢ Improve regulatory compliance: Data is everywhere and is fueling inno- 
vation, new revenue opportunities, better user experiences, and optimized 
operations. Ensuring the right access to this data is critical to complying with 
increasingly strict regulations. 

e Adopt a proactive stance through real-time situational awareness: Crit- 
ical to modern security operations is having the right processes, skills and 
technologies. These technologies include advanced analytics, artificial intel- 
ligence, machine learning, automation and orchestration of cybersecurity 
workflows, as well as real-time visualizations of your vulnerability and threat 
landscape. 
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¢ Be prepared and respond effectively when a crisis occurs: As cyber threats 
and risks grow in volume and complexity, modern organizations are pre- 
pared for crisis situations and are ready to respond effectively, while capturing 
lessons learned. 


What does the success look like? In Figure 11.1 the answer is given with details 
and explanation. 


01 


Confidently focus 
on growth 


When thinking about your 
organization's growth 
priorities, your CEO is 

confident that cyber threats 
and risks will not impede 
progress. Security is 
perceived as a business 
enabler 


02 


Secure digital 
innovations 


When your organization 
launches an innovation 
or digital initiative, you 

are confident it will 
advance and deliver 
results, securely and at 
pace, with end-to-end 
security considerations 
integrated from day 1 


03 


Seamless 
experiences 


Your customers and 
employees enjoy seamless 
experiences that result from 

collaboration with your 
ecosystem partners using 
multi-cloud environments, 

automation and hybrid 
workforces, without fear of 
exposure to cyber attacks 


04 


Proactive stance and Trusted identities and 
context-based access 


crisis preparedness 


You have a team that 
proactively monitors threats 
and resolves incidents, without 
tisk of financial or reputational 
impacts. And if crisis situations 
do occur, you're prepared and 
know how to respond quickly 
and appropriately 


05 


Be it employees, 
customers, suppliers, or 
even software robots, 
and loT devices you're 
certain that they have 
access to the right 
assets and information 
at the right time, for the 


right reason 


Figure 11.1. What does the success look like? 


11.3 CGI Cybersecurity Advisory Services 


We highlight eight key advisory services to help clients achieve an insights-led 
balanced approach to cybersecurity in this new complex and connected digital 
reality [5]. 


11.3.1 Digital IAM Services 


The variety, volume and velocity of both human and non-human (or silicon) iden- 
tities (e.g., Internet of Things sensors, devices, software, artificial intelligence, micro 
services, and application programming interfaces) and their access needs are increas- 
ing dramatically. With our Digital IAM Advisory Services, you can achieve agility 
and innovation while keeping digital identities and their access to critical systems 
and data both secure and frictionless. Our services range from identity gover- 
nance and administration (IGA) strategy and roadmap development, to specific 
IGA advisory services for the new classes of silicon and external identities, to IAM 
operating model design, to IAM federation and integration across your enterprise 
and ecosystems. 
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11.3.2 Secure Multi-Clouds Operations Advisory 


Hybrid, multi-cloud environments are becoming the new normal, creating com- 
plex security environments. Our experts can advise you on how to integrate cloud 
services securely into your IT landscape. We start with a maturity and risk expo- 
sure assessment and then design a blueprint for building an operating model that 
secures your operations in a hybrid world. We accelerate this process by bringing 
pre-defined controls for Amazon Web Services, Microsoft Azure and Google Cloud, 
as well as a maturity model and reference architecture. 


11.3.3 Secure Automation Advisory 


We know that automation is a key enabler of cost and operational efficiencies, as 
well as an improved customer experience. Many enterprises seek to automate tasks 
and use artificial intelligence to drive that automation, and we can help you do 
this securely. Through this advisory service, we assess your automation maturity, 
including security aspects, using our maturity model. We also assess pain points, 
identify data privacy issues in processes (e.g., security calls in HR processes), and 
catalog your target systems. 


11.3.4 Digital Risk Management Advicory 


The digital world comes with new risks—from evolved threats, to interconnected 
systems and technologies, to hybrid and perimeter-less work and IT environments, 
to complex data and privacy regulations. This requires much more dynamic, fluid 
and continuous risk management, crisis preparedness and rapid response. Our 
experts can help you manage your risks effectively, while ensuring you continue 
to deliver business outcomes at pace. Our services include integrated risk manage- 
ment programs, dynamic visualization of enterprise risks, privacy and compliance 
assessments, supply chain resiliency and risk management, cybersecurity crisis pre- 
paredness, and crisis response support. 


11.3.5 Digital Security Operations Modernization Advisory 


Security operations approaches of the last decade or even the last five years (e.g., 
pre-cloud, pre-smartphone, pre-artificial intelligence (AI), pre-bots, pre-Internet 
of Things/operational technology) no longer are viable. Today’s digital demands 
require a fundamental change in security operations, whether evolutionary or trans- 
formative. Through our advisory services, we assess your current state of capabili- 
ties across tooling, processes and talent. This includes evaluating your environment 
scope, data sources, connectivity, logging and event streams, deep analytics and 


194 Securing Today’s Complex Digital Realities 


Al, incident processing, threat intelligence, orchestration and automation, hunt 
capabilities, and designs. We report our findings and jointly develop a moderniza- 
tion strategy and roadmap with prioritized practical initiatives (e.g., re-platforming, 
mentoring and upskilling/training). We also offer a hybrid “own vs. buy” advisory 
service and assist you in developing the supporting strategic business case. 


11.5.6 Cybersecurity Privacy by Design Framework 


Easier access to development platforms means that more development is happening 
outside of the IT department (e.g., citizen developers and shadow IT). Enterprises 
increasingly seek greater connectivity and interoperability of the systems and ser- 
vices within their supply chains to improve efficiency, collaboration and the user 
experience. Data and privacy regulations and breaches have increasingly expensive 
consequences. All of these factors reinforce the fact that embedding cybersecurity 
and privacy into every project is much more efficient and effective than managing it 
as an afterthought. Security and privacy teams should establish standard, ready-to- 
use solutions for all IT and business projects. Our experts can assist you in building 
frameworks to achieve this level of readiness and reuse. After a thorough analysis 
of your current landscape, we recommend specific measures to fill gaps, including 
tooling advice and support. 


11.3.7 Security Service Center Design 


Increasingly, digital organizations require flexible access to new skills, retention of 
critical knowledge, and automation to ensure business continuity and resilience. 
We can work with you to design a security service center that meets modern needs, 
standardizes practices, and delivers the right level of expertise. We start by gaining 
an understanding of the services required, and then build a service catalog, design 
how to engage the service center, and establish a continuous improvement process. 


11.5.8 Security Operating Model Design 


When you initiate digital initiatives, new organizational structures, or mergers, 
acquisitions or divestitures, or carve-outs, your security target operating model 
(TOM) must be adapted to ensure all processes and infrastructures reflect these 
changes. Our experts work with you to design and implement your TOM by 
assessing your as-is state, identifying weaknesses and gaps, designing a new model 
(including processes and governance), and gaining approval and acceptance. We 
use proven templates and best practices to accelerate the process. 
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11.4 Cases in Point 


Serving as access control broker for 10+ million industrial IoT 
digital assets for an industry-wide service 


For a large nationwide program involving the rollout of millions of industrial IoT 
digital assets, CGI designed, built, implemented, hosted, ran and supported the 
data services that lie at the heart of this program. Our IAM advisory services, along 
with security services enable companies to access information to improve their ser- 
vices and customer experiences. These LAM services are crucial to the maintenance 
of consumer confidence which underpins the nationwide program and rollout. 

Our solution provides a high-availability, high resilience communication service 
in accordance with specifications and provides an access control function that cryp- 
tographically validates all access requests and verifies right of access against IoT reg- 
istration data. It also includes an industry-wide federated identity provider (IDP) 
service, enforcing federated two-factor authentication for employees of industry 
parties, roles and privilege assertion using SAML, and self- service management by 
industry party administrators. In addition, the IDP service also includes effective 
management of privileged staff, management of risk in accordance with ISO 27005 
and delivery of associated security services. 
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Moving to the cloud securely and reliably 


When a large aerospace and defense company sought to implement its public cloud 
migration strategy, data security and service reliability were of critical importance. 
Based on our significant experience in third-party vendor management, as well as 
managing cloud environments and their related risks, the client engaged us to assist 
in negotiating the security management aspects of its public cloud contracts. 

This included developing a standard security annex and contract clauses, analyz- 
ing cloud provider security practices, conducting negotiation workshops, and pro- 
viding a residual risk assessment. For the tailored security annex, we defined criteria 
for selecting applicable security requirements based on service type and identified 
process improvements. 

In addition to completing negotiations, the client now has a standard set 
of requirements and documented process to support future procurements that 
includes early involvement of the security team. 


Innovation, collaboration, co-creation, experimenting and 
prototyping with partners and clients 


We invest in collaboration, innovation, knowledge exchange with internationally 
recognized experts in the cybersecurity field with the aim of enhancing our cyber- 
security knowledge, skills, services and approaches. An example is the Horizon 
2020 European partnership research and innovation project Cyber-Trust, where 
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CGI together with 8 other partners from 7 European countries joint forces to 
develop innovation advanced cyber-threat intelligence, detection, mitigation eco- 
system [6]. We are also regular contributor to different innovation fora on cyberse- 
curity topic engaging our clients and partners. 


11.5 Achieving a Balanced, Proactive, Insights-led 
Cybersecurity Approach 


We know that without the right cybersecurity and privacy protections, you face 
evolving risks and obstacles to innovating and collaborating effectively. Therefore, 
our goal is simple. We want to help you operate and transform at pace and with 
confidence—today and into the future. 

With 45 years of experience in securing critical business systems across a range 
of industries globally, our cybersecurity approach for today’s modern work envi- 
ronments has been tested and proven. Thanks to this experience, we bring acceler- 
ators in the form of maturity models, reference architectures, technical know-how, 
cross-domain expertise, risk management methods, and client lessons learned to 
accelerate and empower your business. 

By staying abreast of rapidly changing technologies, ecosystems and threats, our 
consultants work closely with you to understand your environment and needs. We 
help you to achieve the right balance between business agility and effective deter- 
rence, defense, detection and response capabilities. 

We stand ready to help you to secure your digital operations. 
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Security and Privacy in Digital Twins 


By G. Sargsyan 
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Digital Twins term is one of the important topics in the digitalization world which 
is becoming increasingly important in different areas of industries. There are many 
debates which explore the growing importance of digital twins, including, possibil- 
ity that they will take the control over humans, or the difficulties to interact with 
digital twins, the end-users of it, their impact on society and sustainability and mak- 
ing this world a better place, and last but not least, security and privacy aspects in 
Digital Twins. This chapter will explore the security and privacy in Digital Twins 
based on the author's — G. Sargsyan’s presentation given during the event “Digital 
Twin a Promising Thing?” on Oct 29, 2020 in Amsterdam, which was broadcasted 
globally and organized and hosted by Amsterdam University of Applied Sciences 
in collaboration with the Digital Society School [1]. In this event the author shared 
her views on digital twins for different industries, risks, privacy, security and ethical 
considerations introducing practical examples, which is introduced in this chap- 
ter. Recommendations how to manage risks, security and privacy concerns are also 
offered and demonstrated in this chapter. 
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12.1 Today’s Digital Reality and What it means for 
Cybersecurity 


Digital twins are virtual replicas of physical devices that combing data science and 
IT can be used to run simulations before actual devices are built and deployed. They 
are also changing how technologies such as IoT, AI and analytics are optimized. 
Digital twins are becoming a business imperative, covering the entire lifecycle of an 
asset and forming the foundation for connected products and services. Although the 
term “digital twin” was first coined in 2002 [2], the concept itself goes back further. 
In 1970 NASA pioneered this idea of working with digital models of real-world 
systems during its Apollo missions. Being able to create accurate simulations, based 
on real-world data, played a significant role in helping NASA bring its astronauts 
safely back to Earth following equipment failure on Apollo 13 [3]. 

Nowadays, digital twins are becoming a business imperative, covering the entire 
lifecycle of an asset and forming the foundation for connected products and ser- 
vices. Companies that fail to respond will be left behind. 

There is tremendous amount of market research conducted on the Digital 
Twin topic. To name a few, selected facts and figures are introduced from mar- 
ket research. According to MarketsAndMarkets report, the digital twin market is 
expected to grow from $3.1 billion in 2010 to $48.2 billion by 2026 at a CAGR 
of 58% from 2020 to 2026 with some of the largest adopters being healthcare and 
defense [4]. Gartner argues that by 2021, half of large industrial companies will use 
digital twins, resulting in those organizations gaining a 10% improvement in effec- 
tives [5]. With the number of connected devices forecast to grow to 42bn by 2025, 
according to research group IDC, we are rapidly entering the era of “hyper-data”. 
Each of those devices emits a constant stream of data, enabling us to build a digital 
cloud that will metaphorically encircle our planet. We can, to use the jargon, create 
“digital twins” of the real world [6]. To reflect the reports, it is evident that digital 
twins will transform the world and business need to stay relevant not to miss their 
opportunities. 


12.2 Cases in Point 


I highlight three cases in point from business and real world about security and 
privacy in digital twins. 
12.2.1. Smart City 


Imagine the challenges associated with moving a city. This is the reality faced 
by Sweden’s northernmost city, the mining town of Kiruna. To continue the safe 
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growth of mining — an industry central to the city’s economy and culture — Kiruna 
and its 18,000 residents are moving 3 kilometers east. While new homes and a new 
city center are built, some of Kiruna’s most historic buildings, such as the Kiruna 
Church, recognized as one of Sweden’s most popular and beautiful wooden build- 
ings, will be physically moved to the new city center. 

To enable the world’s largest municipal relocation, Kiruna needed an innovative 
approach, and city managers established the Kiruna Sustainability Center (KSC) to 
develop and test new ideas for sustainable solutions. The KSC brings together an 
ecosystem of municipalities, industry experts, researchers, universities and citizens 
in an effort to drive greater innovation and new business opportunities. 

During the initial phases of the Kiruna relocation, CGI helped city Kiruna to 
devised an innovative concept called Hidden City that uses Microsoft HoloLens 
augmented reality in combination with geographic information system (GIS) 
equipment and data to digitally map and visualize the underground infrastructure. 
The project is pioneering the outdoor use of HoloLens, which by design, is made 
to be used indoors. For Kiruna, Hidden City provides an accurate underground 
image before starting infrastructure repairs [7]. 

Hidden City was a finalist in the “innovative idea” category of the World Smart 
City Awards 2018 and finalist for the “best innovator” award at the Kiruna City 
business awards. Kiruna and CGI also have been featured by Microsoft in its cus- 
tomer story: “Moving a city with the help of Microsoft HoloLens [8]. 


12.2.2 Transport: Rail 


Despite substantial investments in the Betuweroute and the port railway, the vol- 
ume of goods on the Dutch railways has been virtually stable for about 15 years, 
while the other hinterland transport modalities for goods (truck & barge) continue 
to grow (source: CBS). Surprising in times when sustainability is rightly becoming 
increasingly important. The Ministry of Infrastructure and Water Management in 
The Netherlands has therefore expressed the ambition that rail freight transport 
should have doubled by 2030. 

For better and more efficient business process management of management, CGI 
helped ProRail to develop test and introduce innovations including creating a Dig- 
ital Twin [9, 10]. This digital twin is the basis for the information systems with 
which we control ProRail’s core processes. The world of grid operator ProRail is 
outside and a lot is happening there. The network performs all kinds of tasks in 
different places at the same time. Measurements provide digital information that is 
collected, purified, modeled and combined. A Digital Twin is generated from that 
information, which is in fact a digital representative of the real world. But it’s more 
than that. The Digital Twin also represents planned/designed and already vanished 
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objects. It thus covers the entire life cycle of the object structure and associated 
information management. Moreover, use of the network is part of the 5D world 
of ProRail. 5D is a combination of 3D location-bound information with a time 
registration and the level of detail of the product. This in turn serves as the basis 
for the information systems with which ProRail manages its core processes. 


12.2.3 Aerospace and Defense 


Aerodynamics of a fighter jet are insanely complex that computer simulations 
quickly reach their limits. As a result BAE (Brithish Airpospace) is creating 3-D 
printed models for supersonic wind tunnel tests to refine the shape of the aircraft. 
The digital twin concept will be used to design test and support every single sys- 
tem and structure for Tempest, which is scheduled to enter into an active service in 
2035. Still in the concept phase, the Tempest will be one of the first sixth-generation 
(6G) fighters and is designed to complement current combat craft. It will have con- 
figurable, Artificial Intelligence and cyber-hardened communications that allow the 
aircraft to act flying command and control center, with the pilot acting more as an 
executive officer than as a dogfighter. By taking entirely digital approach they also 
transform the way the organisation works. The BAE systems achieved what tradi- 
tionally would have taken a number of months in a number of days. As a result 
they are working faster for the future triggering open mind and innovation [11]. 


12.3 Risks, Security, Privacy and Ethics 


With all discussed above, there’s an element of risk involved. Now let’s look at the 
potential risks and challenges a digital twin can expose you to from a security and 
privacy perspective. The obvious concerns are security, privacy, surveillance and 
ethics that need to be addressed before these systems are deployed. Wider applica- 
tion of the digital twin concept creates ethical challenges as well as technical ones. 
Companies usually own the assets they use in their factories. Once you have sold a 
physical product to a customer, who owns the rights to its digital twin? Concerns 
over privacy and the potential misuse of data are already widespread in the worlds 
of e-commerce and social media. Now consumers are raising the same questions 
about the growing number of connected products in their lives. Consumer rights 
advocates are already raising questions about the use of connected toys that collect 
data on the behavior and preferences of their users, for example. 

Ethical, privacy and societal implications of Digital Twins are another dimen- 
sions which are vital and need attention. So far speculations about the ethical, pri- 
vacy and legal provisions for regulating the development and usage of a Digital 
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Twins have been based on the concept of the physical and the digital twin remaining 
separate entities, as tighe term “twin” itself suggests. Responsibility, ethics, decency, 
morality will not only experience a renaissance, they will need to be imbued with 
immense significance, for this is a matter of data and transparency. 


12.4 Digital Twins Security Drivers, Concerns and 
How to Manage 


Security and privacy by design approach is becoming a norm in the current complex 
digital environments. By operationalising security and privacy by design approach, 
security can become a vital enabler of trust in the operation of products and assets 
using digital twins. The digital twin can become the full driver of communication 
and collaboration across the organisation's entire digital thread, in other words it can 
become a framework to unify and orchestrate data across a product’s life cycle. This 
can happen only if the selected and just right security policies and technologies are 
applied and maintained to preserve and maintain digital trust. The participants can 
collaborate and safety operate products, assets and processes though digital twins, 
solely in an authenticated and trusted ecosystem. 

As with any digital security strategy, consistent updating of technologies and 
policy is critical so the organisation can stay one step ahead of cyber criminals, 
and securing the multiple endpoints of products, assets and processes will require 
a complex, multi-layered, distributed approach to security. 

For organisations that want to create or improve their digital twin initiatives, 
projects or progarmmes, and to ensure the success of their digital transformation in 
general, they can count on the security team. Now the security team has the oppor- 
tunity to position itself as a business enabler that drives innovation and business 
outcomes. Thus, the security team can become the guarantor of digital trust, imple- 
menting security by design into the digital twin initiatives, but also throughout the 
organisation's culture, practices, processes and platforms. 

The safe inclusion of the whole ecosystem and supply chain into the digital twin 
will be crucial, as all partners will need to be part of the model for it to function 
properly. While all stakeholders engagement and collaboration has their own chal- 
lenges, it is critical that parties need to collaborate effectively to be able to manage 
the security and privacy risks and be able to succeed. 
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